The principle of deviation is central to dispersion; it measures the distance of each data point from a central point, usually the arithmetic mean . The sum of these deviations is always zero, which is why we use absolute values or squares to measure spread.
Variance () represents the average of the squared deviations from the mean. Squaring the deviations ensures that negative differences do not cancel out positive ones, effectively penalizing larger outliers more heavily than smaller ones.
Standard Deviation () is the square root of the variance, returning the measure to the original units of the data. It is the most mathematically robust measure because it satisfies various algebraic properties and is used extensively in inferential statistics.
Range: The simplest measure, calculated as . It provides a quick snapshot of the total spread but is highly sensitive to extreme outliers and ignores the distribution of values in between.
Mean Deviation (MD): The arithmetic average of the absolute differences between each value and the mean (or median). It is calculated as .
Standard Deviation (SD): Calculated by taking the square root of the sum of squared deviations divided by the number of observations. For a population: .
Coefficient of Variation (CV): Used to compare the variability of two or more series with different units or different means. It is expressed as a percentage: .
Coefficient of Range: A relative measure of range calculated as , allowing for comparison between datasets of different scales.
Check the Units: Always remember that absolute measures like Range and SD have units (e.g., dollars, kg), while relative measures like the Coefficient of Variation are unitless ratios or percentages.
The 'Zero' Rule: If all values in a dataset are identical (e.g., 5, 5, 5), all measures of dispersion will be exactly zero. If you calculate a negative value for variance or SD, you have made a calculation error, as these must always be non-negative.
Effect of Constants: Adding or subtracting a constant from every value in a dataset does not change the measures of dispersion (Range, SD, MD). However, multiplying every value by a constant will multiply the SD by and the variance by .
Outlier Awareness: If an exam question asks for a measure of spread that is 'robust' or 'resistant' to outliers, the Range is the worst choice, while the Interquartile Range (IQR) is usually the best.
Sum of Deviations: A common mistake is trying to average the raw deviations without squaring them or taking absolute values. This will always result in zero, regardless of how spread out the data is.
Denominator Confusion: In sample statistics, we often divide by (Bessel's correction) instead of to provide an unbiased estimate of the population variance. Always check if the problem specifies a 'population' or a 'sample'.
Interpreting CV: Students often confuse 'higher CV' with 'better'. In reality, a lower Coefficient of Variation indicates greater consistency and less risk, while a higher CV indicates more volatility.