A Box Plot (or box-and-whisker diagram) is a standardized way of displaying the distribution of data based on a five-number summary: minimum, lower quartile (), median (), upper quartile (), and maximum.
The Cumulative Frequency is the running total of frequencies in a dataset, where each value represents the sum of all frequencies up to a specific upper boundary.
The Interquartile Range (IQR) represents the middle 50% of the data and is calculated as . This measure is more robust than the range as it is less affected by extreme outliers.
Outliers are individual data points that fall significantly outside the overall pattern of distribution, often defined mathematically as values more than above or below .
Check the Boundaries: When plotting cumulative frequency, always use the upper boundary of the class interval. Using the midpoint is a common error that results in shifted data.
Scale Awareness: Examiners often use non-linear or abbreviated scales. Always verify the value of one small grid square on both axes before reading or plotting points.
Comparison Language: When asked to compare two datasets using box plots, always comment on both the average (using the median) and the spread (using the IQR or range) in the context of the problem.
The 'Greater Than' Trap: If a question asks for the number of items 'greater than' a value , find the cumulative frequency at and subtract it from the total frequency ().
Median vs. Mean: Students often confuse the median (middle value) with the mean (arithmetic average). Box plots and CF graphs specifically represent the median.
Joining Points: In cumulative frequency graphs, points should generally be joined with a smooth curve. Using straight lines is sometimes acceptable but may lead to less accurate estimates for quartiles.
Frequency vs. Cumulative Frequency: Ensure you are plotting the running total, not the individual class frequencies (which would result in a frequency polygon or histogram).