Quartile Distribution: The data is partitioned into four intervals. The distance between and is known as the Interquartile Range (IQR), which measures the spread of the central 50% of the observations.
Robustness of the Median: Unlike the mean, the median is a resistant measure of center. This means it is not heavily influenced by extreme values or outliers, making the box plot an excellent choice for skewed distributions.
The 1.5 × IQR Rule: This mathematical threshold defines the boundaries for 'normal' data. Any value smaller than or larger than is statistically flagged as an outlier.
Step 1: Order the Data: Arrange all data points in ascending order. This is a critical prerequisite for identifying the median and quartiles accurately.
Step 2: Calculate the Five-Number Summary: Find the median of the entire set, then find the medians of the lower and upper halves to determine and . The minimum and maximum are the smallest and largest values that are not outliers.
Step 3: Determine Fences: Calculate the IQR () and multiply by 1.5. Subtract this from for the lower fence and add it to for the upper fence to identify outliers.
Step 4: Construct the Plot: Draw a box from to with a line at the median. Extend whiskers to the smallest and largest data points that fall within the calculated fences.
| Feature | Box Plot | Histogram |
|---|---|---|
| Primary Focus | Summary statistics and spread | Frequency and density |
| Outlier Detection | Explicitly identifies outliers | Outliers may be hidden in bins |
| Comparison | Easy to stack multiple plots | Difficult to overlay multiple sets |
| Sample Size | Does not show total count | Can indicate total count via area |
Verify Data Ordering: Always ensure the data is sorted before calculating quartiles. A common mistake is picking the middle number from an unsorted list, which leads to incorrect median and IQR values.
The 25% Rule: Remember that each segment of the box plot (each whisker and each half of the box) represents exactly 25% of the data points. If one segment is much longer than another, it indicates that the data in that 25% is more spread out, not that there are more data points there.
Whisker Termination: Ensure whiskers stop at the actual data values (the 'inner' minimum and maximum), not at the calculated fence values. The fences are only used to decide which points are outliers.
Sanity Check: If the median is closer to , the data is likely right-skewed. If it is closer to , it is left-skewed. Use this visual cue to verify your calculations.