What is the primary difference between the direction of the 'peak' and the direction of the 'skew'?

The peak indicates where the majority of data is concentrated (the mode), while the skew refers to the direction of the long 'tail' where data is sparse. A distribution is named after the direction of its tail, not its peak.

How does the position of the median within a box plot differ between positive and negative skew?

In a positive skew, the median is closer to the bottom ($Q_1$), leaving a larger gap between the median and $Q_3$. In a negative skew, the median is closer to the top ($Q_3$), leaving a larger gap between the median and $Q_1$.

When comparing Mean, Median, and Mode, which measure is most affected by skewness and why?

The Mean is most affected because its calculation includes the value of every data point. Extreme values in the tail pull the Mean toward them, whereas the Mode stays at the peak and the Median only shifts slightly based on rank.

What is a common mistake when identifying skewness from a frequency polygon?

Students often look at the 'bulk' of the data on one side and name the skew after that side. However, the skew is named after the 'tail' (the side with fewer, more extreme values).

Why is it incorrect to assume a distribution is symmetrical just because the Mean and Median are similar?

While similar Mean and Median values often suggest symmetry, a distribution could be bimodal or have balanced outliers on both ends that cancel each other out, masking a non-symmetrical shape.

What error occurs if you use the Mean to describe the 'typical' value of a heavily skewed dataset?

The Mean will be 'biased' by extreme outliers in the tail, making it unrepresentative of the majority of the data. In such cases, the Median is a better measure of the 'typical' value.

Define 'Positive Skew' in terms of its tail and data distribution.

Positive skew (or right-skew) occurs when the distribution has a long tail extending toward higher values on the right. This indicates a concentration of data at the lower end with a few unusually high values.

Define 'Negative Skew' in terms of its tail and data distribution.

Negative skew (or left-skew) occurs when the distribution has a long tail extending toward lower values on the left. This indicates a concentration of data at the higher end with a few unusually low values.

What does it mean for a distribution to have 'Zero Skew'?

Zero skew means the distribution is perfectly symmetrical. The data is distributed evenly on both sides of the center, and the Mean, Median, and Mode are all located at the same point.

How do you mathematically verify positive skew using quartiles?

You verify it by checking if the distance from the median to the upper quartile is greater than the distance from the median to the lower quartile: $(Q_3 - Q_2) > (Q_2 - Q_1)$.

Library Podcasts

Courses

Referral & Rewards

Revision Notes

AS-Level

Cambridge International Examinations

Maths

Probability And Statistics 1

Data Presentation & Interpretation

Skewness

Summary

Skewness is a statistical measure that describes the degree of asymmetry in a data distribution. While measures of central tendency locate the middle of the data, skewness identifies whether the data is 'leaning' or stretched toward one side, creating a characteristic 'tail' that significantly influences the relationship between the mean, median, and mode.

1. Definition & Core Concepts

Skewness refers to the lack of symmetry in a probability distribution or frequency distribution of a dataset.
A symmetrical distribution is one where the left and right sides are mirror images of each other, typically resulting in the mean, median, and mode being equal.
Asymmetry occurs when data points are not evenly distributed around the center, causing the distribution to stretch further in one direction than the other.
The direction of the skew is defined by the direction of the tail (the long, thin part of the curve), not the direction of the peak.

Comparison of positive and negative skew curves showing the direction of the tails.

2. Underlying Principles

The Mean is highly sensitive to extreme values (outliers) and is pulled toward the tail of the distribution.
The Mode represents the highest frequency and remains at the peak of the distribution, regardless of the tail's length.
The Median is a positional measure and typically falls between the mode and the mean in a skewed distribution, as it is less affected by extreme values than the mean but more so than the mode.

3. Methods & Techniques

4. Key Distinctions

5. Exam Strategy & Tips

6. Common Pitfalls & Misconceptions

Peak vs. Tail: A common error is labeling a distribution based on where the peak is located. If the peak is on the left, the tail is on the right, making it a positive skew.
Outlier Influence: Students often forget that a single extreme outlier can create skewness in an otherwise symmetrical dataset, significantly shifting the mean while leaving the median unchanged.
Symmetry Assumption: Do not assume a distribution is symmetrical just because it looks 'bell-shaped' at first glance; always verify by checking the mean and median.