Formula:
Here, is the mean, is the total of all data values, and is the number of values.
The median is the middle value when the data is ordered, so it depends on position rather than magnitude. This makes it more resistant to extreme values, which is why it is often preferred when the distribution contains outliers or is not well balanced.
The interquartile range measures the spread of the middle half of the data. Because it ignores the most extreme quarter at each end, it is less affected by outliers than the full range.
Formula:
Here, is the lower quartile and is the upper quartile.
Standard deviation and variance measure how far values tend to lie from the mean. They are especially useful when the mean is a sensible center, because both statistics describe spread around that mean rather than around the median.
Relationship: and
A smaller value indicates greater consistency, while a larger value indicates more variation.
The general interpretation principle is that a lower spread means observations are more consistent, while a higher spread means they are more dispersed. However, spread must be judged relative to the purpose of the data, because high variation may be acceptable in some contexts and problematic in others.
| Situation | Preferred location | Preferred spread |
|---|---|---|
| Roughly symmetrical data | Mean | Standard deviation or variance |
| Data with outliers | Median | Interquartile range |
| Aspect | Location | Spread |
|---|---|---|
| Purpose | Describes center | Describes variability |
| Examples | Mean, median | Range, IQR, SD, variance |
| Key question | "What is typical?" | "How consistent is it?" |
| Feature | Mean | Median |
|---|---|---|
| Uses all values | Yes | No |
| Sensitive to outliers | High | Low |
| Best for | Roughly symmetrical data | Data with outliers |
| Measure | What it uses | Strength | Weakness |
|---|---|---|---|
| Range | Minimum and maximum | Simple and quick | Highly sensitive to extremes |
| IQR | Middle half of data | Resistant to outliers | Ignores outer half |
| SD | All values around mean | Detailed spread measure | Not robust to outliers |
Exam habit to memorize: choose suitable measures, compare center and spread, and state the conclusion in context.
A common mistake is assuming the mean is always the best average. This is false because extreme values can pull the mean toward the tail of the distribution, making it less representative of the typical observation.
Another mistake is describing one data set as "better" using only a central measure. A larger mean or median does not automatically imply superiority, because the spread may be so large that the results are unreliable or inconsistent.
Students often think the median and quartiles must change whenever a value is added or removed. In reality, these measures depend on ordered position, so they sometimes stay the same and sometimes shift; you must check the arrangement rather than guess.
It is also easy to misuse spread measures by pairing them badly. For example, quoting a mean with an IQR can be less coherent than using mean with standard deviation, because the center and spread summaries are then based on different ideas about the distribution.
Interpreting data connects directly to box plots, because box plots visually display the median, quartiles, and overall spread. This makes them especially useful for comparing distributions quickly when robust statistics such as median and IQR are appropriate.
The topic also connects to data cleaning and data quality. Removing an error or adding an omitted value can change statistical summaries, so interpretation must account for whether the data set is complete and trustworthy before drawing conclusions.
In wider statistics, interpreting data is part of statistical reasoning rather than pure calculation. The goal is to make sensible decisions from summaries, assess which statistics are reliable, and avoid conclusions that ignore variation, outliers, or context.