The Arithmetic Mean is the sum of all values divided by the number of observations, represented as . It acts as the mathematical center of the data but is highly sensitive to extreme values or outliers that can skew the result.
The Median is the middle value when data is arranged in ascending order, effectively splitting the dataset into two equal halves. It is a 'robust' measure because it remains unaffected by extreme outliers, making it ideal for describing skewed distributions like household income.
The Mode identifies the most frequently occurring value in a dataset. A distribution can be unimodal (one peak), bimodal (two peaks), or multimodal, and it is the only measure of central tendency applicable to nominal or categorical data.
Range is the simplest measure of variability, calculated as the difference between the maximum and minimum values (). While easy to compute, it only considers the two most extreme values and ignores the distribution of the data in between.
Variance measures the average squared deviation of each data point from the mean, providing a sense of how 'spread out' the data is. The sample variance formula uses in the denominator () to provide an unbiased estimate of the population variance.
Standard Deviation is the square root of the variance, expressed in the same units as the original data. It is the most widely used measure of spread because it allows for direct comparison with the mean and is essential for understanding the Empirical Rule in normal distributions.
| Feature | Mean | Median | Mode |
|---|---|---|---|
| Sensitivity | High (affected by outliers) | Low (robust) | Low |
| Data Type | Interval/Ratio | Ordinal/Interval/Ratio | Nominal/Ordinal/Interval/Ratio |
| Mathematical Use | Advanced algebraic manipulation | Limited | None |
| Best For | Symmetrical data | Skewed data | Categorical data |
Standard Deviation vs. Variance: While variance is mathematically useful for further statistical tests, standard deviation is more intuitive for reporting because it shares the same units as the data (e.g., dollars vs. dollars squared).
Sample vs. Population: Always check if you are calculating for a sample () or a population (). Using the wrong denominator is a frequent source of error in descriptive calculations.
Check for Outliers First: Before choosing a measure of central tendency, scan the data for extreme values. If outliers exist, the median is almost always a better descriptor of the 'center' than the mean.
The Sum of Deviations Rule: Remember that the sum of the differences between each data point and the mean is always zero (). This is a quick way to verify if your mean calculation is correct.
Units Matter: In exams, ensure that your standard deviation and range include the correct units of measurement. Variance units are squared, which often makes them physically nonsensical in a real-world context.
Sanity Check: If your calculated standard deviation is larger than the range of the dataset, you have made a calculation error. The standard deviation must always be less than or equal to the range.