When Does Box Plot Hide Information?

Box plot is a powerful way to visualize the distribution of a continuous variable. However, it hides crucial information when our data is not uni-modal (i.e. has more than one peak in the distribution).

Box plot is a very information-rich. From the graph, we can see:

  • The median value, as shown by the bar in the middle.
  • The inter-quartile range, shown by the total length of the box.
  • The 1st quartile (25th percentile) and the 3rd quartile (75th percentile), indicated respectively by the lower boundary and the upper boundary of the box.
  • Outlier values, as indicated by individual dots plotted outside of the whiskers range.
  • The approximate degree of dispersion in the data, shown by the length of the box
    • Shorter box indicates a smaller variance in the data, and longer box indicates a larger variance.
  • Whether the distribution is symmetrical or skewed
    • If the position of the median bar is closer to the middle, then the distribution is approximately symmetrical; and if the bar is positioned towards the side, then the distribution is skewed.
Symmetrical distribution; Relatively low variance; Outlier value

Symmetrical distribution; Relatively low variance; Outlier value

Skewed distribution; Slightly higher variance; No outlier

Skewed distribution; Slightly higher variance; No outlier

However, box plot has one drawback – it hides the shape of the distribution if our data is bi-modal (or multi-modal).

For example, here we have some data that has a bi-modal distribution – the size of the Christian population as a percentage of a country’s total population.

If we draw a box plot for this data, this bi-modal property is completely hidden.

So if our data has more than one peak, then box plot would not be the most appropriate graph to display the distrbution shape. Good old histogram is a better choice in this context.

Research Methods in Political Science
Supplemental course materials for Spring 2019.
Previous
comments powered by Disqus