Arithmetic Mean: The most common measure of 'average,' calculated by summing all values and dividing by the total number of observations (). It acts as the mathematical center of gravity for the data, but it is highly sensitive to extreme values known as outliers.
Median: The middle value in a dataset when the observations are arranged in ascending or descending order. If the dataset has an even number of observations, the median is the average of the two middle values; it is a robust measure that remains unaffected by outliers.
Mode: The value that appears most frequently in a dataset. A dataset can be unimodal (one mode), bimodal (two modes), or multimodal, and it is the only measure of central tendency applicable to qualitative or categorical data.
Mean Formula: where represents each individual value and is the total count.
Range: The simplest measure of spread, calculated as the difference between the maximum and minimum values in a dataset. While easy to compute, it only considers the two most extreme points and ignores the distribution of the values in between.
Variance: A measure of how far each number in the set is from the mean and from every other number in the set. It is calculated by averaging the squared differences from the mean, which ensures that negative deviations do not cancel out positive ones.
Standard Deviation: The square root of the variance, providing a measure of spread in the same units as the original data. A low standard deviation indicates that data points tend to be close to the mean, while a high standard deviation indicates that the data is spread over a wider range.
Sample Variance Formula: Note the use of (Bessel's correction) to provide an unbiased estimate for sample data.
Understanding when to use specific measures is critical for accurate data representation. The following table compares the Mean and Median across different data scenarios:
| Feature | Arithmetic Mean | Median |
|---|---|---|
| Sensitivity | High (Affected by outliers) | Low (Robust to outliers) |
| Data Type | Interval or Ratio scale | Ordinal, Interval, or Ratio |
| Mathematical Use | Preferred for further calculations | Preferred for skewed data |
| Representation | Represents the 'average' value | Represents the 'middle' value |
Check for Outliers: Before choosing a measure of central tendency, always look for extreme values. If outliers exist, the median is usually a more honest representation of the 'typical' value than the mean.
Verify Units: Remember that variance is expressed in squared units (e.g., square dollars), while standard deviation is in the original units (e.g., dollars). Exams often test if you can distinguish between these two.
The Rule: Always identify if the problem provides a 'sample' or a 'population.' Using instead of for a sample is one of the most common ways students lose marks in descriptive statistics.
Sanity Check: A calculated mean or median must always fall between the minimum and maximum values of the dataset. If your result is outside this range, a calculation error has occurred.