The curve represents the distribution of the population. A steep section of the curve indicates a high frequency of data points within that specific range, while a flat section indicates fewer data points.
The total frequency () is the maximum value on the y-axis. All positional statistics are calculated as fractions of this total value, allowing for comparisons between datasets of different sizes.
The Interquartile Range (IQR) measures the spread of the middle 50% of the data, calculated as . This provides a measure of variability that is less sensitive to outliers than the total range.
Percentile Formula: To find the -th percentile, locate the value at and map it to the x-axis.
Check the Total Frequency: Always identify the maximum value on the y-axis () before performing any calculations. Do not assume the y-axis ends exactly at .
Precision with Rulers: When mapping values between axes in an exam, use a ruler to draw dashed lines. Small errors in reading the graph can lead to significant errors in the final answer.
Sanity Check: Ensure that . If your median is smaller than your lower quartile, you have likely miscalculated the positions on the y-axis.
'More Than' Questions: A common exam trap is asking for the number of items greater than a certain value. Remember to subtract the y-value from the total frequency: .
Using the wrong axis: Students often confuse the data value (x-axis) with the frequency (y-axis) when asked to find a percentile.
Misinterpreting the slope: A steep slope means a high density of data, not necessarily 'higher' values. A flat line means there is no data in that interval.
Incorrect Quartile Positions: Using instead of for large datasets. In cumulative frequency diagrams, using and is the standard convention for continuous data.