The Principle of Centrality: Measures of central tendency aim to summarize a large volume of data into a single representative value. This allows for easier comparison between different groups or datasets without analyzing every individual data point.
Sensitivity to Outliers: The mean is highly sensitive to extreme values (outliers) because every value in the set contributes to the final sum. In contrast, the median is 'robust' because it only depends on the relative order of values, not their specific magnitudes at the extremes.
Mathematical Foundation of the Mean: The mean is defined by the formula , where is the sum of all observations and is the sample size. This formula implies that the mean is the value that would result if all values in the set were distributed equally.
| Feature | Arithmetic Mean | Median | Mode |
|---|---|---|---|
| Data Type | Interval/Ratio | Ordinal/Interval/Ratio | Nominal/Ordinal/Interval/Ratio |
| Outlier Impact | High (Sensitive) | Low (Robust) | Low (Robust) |
| Uniqueness | Always unique | Always unique | Can be multiple or none |
| Best Use Case | Symmetric data | Skewed data (e.g., income) | Categorical data (e.g., colors) |
Check for Skewness: If the mean is significantly higher than the median, the data is likely positively skewed (skewed to the right). If the mean is lower, it is negatively skewed (skewed to the left).
The 'Even N' Trap: In exams, always check if the number of data points is even when finding the median. Students often forget to average the two middle numbers and simply pick one, leading to an incorrect result.
Verification: For the mean, perform a 'sanity check' by ensuring the calculated value falls between the minimum and maximum values of the dataset. If the mean is outside this range, a calculation error has occurred.
Confusing Mean and Median in Skewed Data: A common mistake is using the mean to describe 'typical' values in a skewed dataset, such as housing prices. Because a few multi-million dollar homes can inflate the mean, the median provides a much more accurate representation of what a typical buyer might pay.
Zero vs. No Mode: If every value in a dataset appears exactly once, the dataset has no mode. It is incorrect to say the mode is '0' unless the number zero actually appears most frequently in the data.
Ignoring the Frequency in Grouped Data: When data is presented in a frequency table, the mode is the value with the highest frequency, not the highest value in the data column itself.