The median is the middle value of a dataset when ordered, representing the 50th percentile. On a CFD, it is the data value corresponding to half of the total cumulative frequency ().
Quartiles divide the data into four equal parts. The lower quartile (LQ) is the 25th percentile, representing the data value below which 25% of the data falls (). The upper quartile (UQ) is the 75th percentile, representing the data value below which 75% of the data falls ().
Percentiles generalize quartiles, dividing the data into 100 equal parts. The percentile is the data value below which % of the data falls, found at the cumulative frequency position of .
The interquartile range (IQR), a measure of data spread, can be calculated from a CFD as the difference between the upper quartile and the lower quartile (). This value indicates the range covered by the middle 50% of the data.
Step 1: Determine Total Data Points (n). Identify the total number of data values in the dataset, which corresponds to the highest point on the cumulative frequency (y-) axis that the curve reaches. This value, 'n', is crucial for calculating the positions of the median, quartiles, and percentiles.
Step 2: Calculate the Position. For the median, calculate . For the lower quartile, calculate . For the upper quartile, calculate . For the percentile, calculate . These calculations determine the cumulative frequency value to locate on the y-axis.
Step 3: Draw Horizontal Line to Curve. From the calculated cumulative frequency position on the y-axis, draw a horizontal line across the graph until it intersects the cumulative frequency curve. This point of intersection is key to finding the corresponding data value.
Step 4: Draw Vertical Line to X-axis. From the point where the horizontal line meets the curve, draw a vertical line straight down to the x-axis (data value axis). The value where this vertical line intersects the x-axis is the estimated measure of position (median, quartile, or percentile).
Cumulative frequency diagrams can also be used to estimate the number of data points that fall within a specific range or exceed a certain threshold. This involves reversing the process of finding measures of position.
To find the number of data points below a certain value, locate that data value on the x-axis, draw a vertical line up to the curve, and then a horizontal line across to the y-axis. The reading on the y-axis is the cumulative frequency up to that data value.
To find the number of data points above a certain value, first find the cumulative frequency up to that value (as described above). Then, subtract this cumulative frequency from the total number of data points, 'n', to get the count of values exceeding the threshold.
To find the number of data points within a specific range (e.g., between and ), find the cumulative frequency for and subtract the cumulative frequency for . This difference represents the frequency of data points within that interval.
Identify 'n' First: Always begin by clearly identifying the total number of data values, 'n', from the highest point on the cumulative frequency axis. This is the foundation for all subsequent calculations of positions.
Precision in Reading: Use a ruler and pencil to draw clear, straight lines when reading values from the graph. Small inaccuracies in drawing or reading can lead to significant errors in the estimated values.
Contextualize Answers: Ensure your estimated values make sense within the context of the data. For example, the median should fall roughly in the middle of the data range, and quartiles should divide the data logically.
Units and Rounding: Pay close attention to the units used on the x-axis and any specific rounding instructions in the question. Avoid unnecessary conversions unless explicitly asked.
Incorrect Position Calculation: A common error is miscalculating the position for the median (), quartiles (, ), or percentiles (). Remember these are positions on the cumulative frequency axis, not data values.
Reading from the Wrong Axis: Students sometimes mistakenly read the cumulative frequency from the x-axis or the data value from the y-axis. Always remember that data values are on the horizontal axis and cumulative frequency on the vertical axis.
Confusing 'Less Than' with 'Greater Than': When asked for the number of values above a certain point, remember to subtract the cumulative frequency at that point from the total 'n', rather than just reading the cumulative frequency directly.
Assuming Exact Values: Forgetting that all values obtained from a CFD are estimates, not exact figures, can lead to overconfidence in precision. The smooth curve is an approximation of the underlying distribution.