Minimum Value: The lowest data point in the set, excluding outliers. It marks the start of the lower whisker.
Lower Quartile (): The value below which 25% of the data falls. It forms the left (or bottom) edge of the box.
Median (): The middle value of the dataset. It is represented by a line inside the box and indicates the center of the distribution.
Upper Quartile (): The value below which 75% of the data falls. It forms the right (or top) edge of the box.
Maximum Value: The highest data point in the set, excluding outliers. It marks the end of the upper whisker.
The Interquartile Range (IQR) is the width of the box () and represents the middle 50% of the data. A wider box indicates greater variability in the central half of the dataset.
The position of the median line within the box indicates skewness. If the median is closer to , the data is positively skewed (right-skewed); if it is closer to , the data is negatively skewed (left-skewed).
Whisker length also provides insight into distribution. Long whiskers suggest that the data in the top or bottom 25% is widely spread out, while short whiskers indicate that the extreme values are close to the quartiles.
| Feature | Box Plot | Histogram |
|---|---|---|
| Primary Focus | Quartiles and Spread | Frequency and Shape |
| Outliers | Explicitly identified | Often obscured in bins |
| Comparison | Excellent for multiple groups | Difficult for multiple groups |
| Data Detail | Summarized (5 points) | Detailed (all frequencies) |
While a histogram shows the exact shape of the distribution (e.g., bimodal), a box plot is superior for comparing the medians and IQRs of different categories simultaneously.
Two-Part Comparisons: When asked to compare two box plots, always make one comment about the average (using the median) and one comment about the spread (using the IQR or range).
Contextualize: After stating the mathematical comparison (e.g., 'Group A has a higher median'), always explain what it means in the context of the problem (e.g., 'On average, Group A performed better').
Check the Scale: Always verify the scale on the axis before reading values. A common mistake is misreading the value of a quartile because the grid increments are not 1 unit each.
Consistency: Use the term 'more consistent' when a dataset has a smaller IQR, as this indicates the middle 50% of the data is more tightly clustered.