What is the primary difference between a bar chart and a histogram?

A bar chart is used for discrete or qualitative data with gaps between bars where height represents frequency. A histogram is used for continuous data with no gaps, and the area of the bars represents the frequency.

When comparing two distributions using box plots, which two features should you always comment on?

You should comment on the average (using the median) and the spread or consistency (using the interquartile range or total range). This provides a complete picture of how the datasets differ in location and variability.

How does a back-to-back stem-and-leaf diagram facilitate comparison?

It shares a single central stem for two sets of data (e.g., Group A and Group B), allowing for an immediate visual comparison of the shape, spread, and modal classes of both groups simultaneously.

Why is it a mistake to plot cumulative frequency at the midpoint of a class interval?

Cumulative frequency represents the total number of observations 'less than or equal to' a specific value. Therefore, it must be plotted at the upper boundary of the class to correctly show that all data in that interval has been accounted for.

What happens to a histogram if you use frequency instead of frequency density for unequal class widths?

The diagram becomes misleading because wider classes will appear to have more data simply because their bars are wider, even if they have fewer observations. Using frequency density ensures the area correctly represents the frequency.

What is a common error when reading scales on a cumulative frequency graph?

Students often fail to identify the value of a single small square on the axes. This leads to inaccurate estimates for the median and quartiles, especially when the total frequency $n$ is not a simple multiple of the grid lines.

Define 'Frequency Density' and provide its formula.

Frequency density is the frequency per unit of the class interval. It is calculated using the formula: $FD = \frac{\text{frequency}}{\text{class width}}$.

What is the 'Interquartile Range' (IQR) and what does it represent?

The IQR is the difference between the upper quartile ($Q_3$) and the lower quartile ($Q_1$). It represents the spread of the middle $50\%$ of the data and is a measure of dispersion that is less affected by outliers than the total range.

What is the purpose of a 'Key' in a stem-and-leaf diagram?

The key explains how to interpret the digits in the diagram. It defines the place value of the stem and the leaf, ensuring the reader knows if $1 | 2$ means $1.2$, $12$, or $120$.

How do you identify an outlier on a box plot?

Outliers are individual data points that fall significantly outside the main body of data. They are often plotted as individual crosses or dots beyond the ends of the whiskers.

Data Presentation | Cambridge International Examinations AS-Level Maths

Revision Notes

AS-Level

Cambridge International Examinations

Maths

Probability And Statistics 1

Data Presentation & Interpretation

Data Presentation

Summary

Data presentation involves the systematic arrangement and visual representation of statistical information to reveal patterns, trends, and distributions. By selecting appropriate graphical tools like histograms, box plots, and cumulative frequency graphs, researchers can effectively communicate complex datasets and facilitate the calculation of key statistical measures such as the median and interquartile range.

1. Definition & Core Concepts

Data Presentation is the process of organizing raw data into visual formats that make the information easier to interpret and analyze. It transforms numerical lists into shapes and trends that highlight the distribution's center, spread, and skewness.
Raw Data refers to the original, unordered observations collected from a source, which often require grouping or sorting before they can be effectively displayed.
Continuous vs. Discrete Data: Continuous data can take any value within a range (e.g., height), while discrete data consists of distinct, separate values (e.g., number of pets). The choice of diagram often depends on this distinction.

2. Underlying Principles

A standard box plot diagram showing the minimum, quartiles, median, and maximum values.

3. Methods & Techniques

4. Key Distinctions

Feature	Histogram	Bar Chart
Data Type	Continuous grouped data	Discrete or qualitative data
Gaps	No gaps between bars	Gaps between bars
Y-Axis	Frequency Density	Frequency
Area	Area represents frequency	Height represents frequency

Box Plots vs. Stem-and-Leaf: While both show distribution, a stem-and-leaf diagram preserves every individual raw data point, whereas a box plot summarizes the data into five key statistics, losing the specific values but gaining clarity for comparison.

5. Exam Strategy & Tips

6. Common Pitfalls & Misconceptions

Plotting at Midpoints: A frequent error in cumulative frequency graphs is plotting the frequency at the midpoint of the class. Because cumulative frequency represents data 'up to' a value, it must always be plotted at the upper boundary of the class.
Height vs. Area: Students often incorrectly use frequency as the height for histogram bars even when class widths are unequal. Remember: if the widths vary, you must use frequency density.
Misinterpreting the Median: Adding a new high value to a dataset does not always change the median. If the new value stays on the same side of the existing median, the middle position may shift but the value might remain the same if there are duplicate middle values.