Data Presentation and Interpretation

Mathematics A - H230 · Cambridge OCR AS Level · 6 min read

Welcome to Data Presentation and Interpretation!

In this chapter, we are going to learn how to take a messy pile of numbers and turn them into a story that everyone can understand. Whether it's looking at the heights of a basketball team or comparing ice cream sales to the weather, statistics helps us see patterns. Don’t worry if you find numbers a bit overwhelming at first—we’ll break everything down into small, manageable steps!

1. Presenting Single-Variable Data

When we only have one "thing" we are measuring (like the weight of apples), we call it single-variable data. We use several different types of diagrams to visualize this.

Key Types of Diagrams

Vertical Line Charts: Good for discrete data (things you count).
Stem-and-Leaf Diagrams: These are great because they show the shape of the data but still keep all the original numbers visible.
Box-and-Whisker Plots: These show a "5-number summary" (minimum, lower quartile, median, upper quartile, and maximum). They are brilliant for seeing the spread of data.
Cumulative Frequency Diagrams: A "running total" graph used to estimate the median and quartiles.

Histograms: The "Area" Rule

Histograms look like bar charts, but they are used for continuous data (things you measure, like time or weight) and often have bars of different widths.
Crucial Point: In a histogram, the area of the bar represents the frequency, not just the height!

The formula you need is:
\( \text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}} \)

Analogy: Think of a histogram bar like a piece of dough. If you make the bar wider (class width), the height (frequency density) must get lower so that the total amount of dough (the frequency) stays the same!

Quick Review: Choosing a Diagram

Want to keep all original data? Use Stem-and-Leaf.
Want to compare the spread of two groups? Use Box Plots.
Have unequal group sizes in continuous data? Use a Histogram.

Key Takeaway: Always check the scale on a histogram! Frequency is the area, so you must multiply height by width to find out how many items are in that group.

2. Measures of Average (Central Tendency)

We use "averages" to find the "middle" or "typical" value in our data.

The Big Three

1. Mean (\(\bar{x}\)): Add all the values and divide by the number of items.
\( \bar{x} = \frac{\sum x}{n} \)
2. Median: The middle value when data is in order.
3. Mode: The most common value.

Averages from Frequency Tables

If the data is in a table, we use: \( \bar{x} = \frac{\sum fx}{\sum f} \).
Important: If the data is grouped (e.g., "10 to 20 mins"), we use the midpoint of each group to calculate the mean. Because we use midpoints, the answer is only an estimate of the mean, not the exact value.

Did you know? The word "Median" is like the median strip in the middle of a highway—it sits right in the center!

Key Takeaway: The mean is sensitive to extreme values (outliers), but the median is much more "robust" and stays stable even if there is one very weird number in the set.

3. Measures of Spread (Variation)

Average tells us where the middle is, but variation tells us if the data is all bunched together or widely scattered.

Quartiles and Percentiles

Lower Quartile (\(Q_1\)): 25% of the way through the data.
Upper Quartile (\(Q_3\)): 75% of the way through the data.
Inter-Quartile Range (IQR): \(Q_3 - Q_1\). This tells us how spread out the middle 50% of the data is.

Standard Deviation and Variance

Standard Deviation (\(\sigma\)) is a more sophisticated way to measure spread. It tells us the "average distance" from the mean. The Variance is simply the standard deviation squared (\(\sigma^2\)).

The formula for standard deviation is:
\( \sigma = \sqrt{\frac{\sum x^2}{n} - \bar{x}^2} \) or \( \sigma = \sqrt{\frac{\sum f x^2}{\sum f} - \bar{x}^2} \) for frequency tables.

Common Mistake: Forgetting to take the square root at the very end. If you forget, you've found the variance, not the standard deviation!

Key Takeaway: A small standard deviation means the data is very consistent and close to the mean. A large one means the data is "all over the place."

4. Outliers and Cleaning Data

Sometimes data contains "freak" results that don't fit the pattern. These are called outliers.

How to spot an outlier

In your OCR exam, you are usually given a specific rule to identify outliers. The most common ones are:

Anything more than 1.5 \(\times\) IQR above \(Q_3\) or below \(Q_1\).
Anything more than 2 standard deviations away from the mean (\(\bar{x} \pm 2\sigma\)).

Cleaning Data

Cleaning data means dealing with these outliers, missing values, or obvious errors. You might choose to remove an outlier if it's a typing error (like someone's height being entered as 500cm!), but you should always justify why you are removing it.

Key Takeaway: Don't just ignore weird numbers! Use the formulas above to prove they are outliers, and then decide if they should stay or go.

5. Bivariate Data (Two Variables)

When we look at two things at once (like "Hours of Revision" and "Exam Score"), we call it bivariate data.

Scatter Diagrams and Correlation

We plot these on a scatter graph to look for correlation (a relationship):

Positive Correlation: As one goes up, the other goes up (e.g., Height and Shoe Size).
Negative Correlation: As one goes up, the other goes down (e.g., Price of a car and its age).
No Correlation: No visible pattern (e.g., IQ and House Number).

Correlation vs. Causation

This is a favorite exam topic! Just because two things are correlated doesn't mean one causes the other.
Example: Shark attacks and ice cream sales both go up in the summer. They are correlated, but eating ice cream does not cause shark attacks! The "hidden cause" is the hot weather.

Regression Lines

A regression line is a "line of best fit" that goes through the mean point \((\bar{x}, \bar{y})\). You won't be asked to calculate the equation of this line in AS Level, but you must be able to interpret it. For example, using the line to make a prediction within the range of your data (interpolation) is usually reliable, but predicting outside the range (extrapolation) is very risky!

Key Takeaway: Correlation is about a pattern, causation is about a reason. Always use the words "interpolation" or "extrapolation" when discussing predictions.

Summary Checklist

Can I calculate the Mean and Standard Deviation using my calculator's stats mode?
Do I remember that Histogram Area = Frequency?
Can I use the \(1.5 \times IQR\) rule to find outliers?
Do I understand why correlation doesn't always mean causation?

You've got this! Practice these definitions and formulas, and you'll be able to interpret any data set that comes your way.

Quick check

Can you answer these now?

Open each question to check the key ideas from this chapter.

What is single-variable data?

Single-variable data refers to datasets where only one attribute or characteristic is being measured or recorded.

What is a major advantage of using a stem-and-leaf diagram?

It displays the overall shape and distribution of the data while keeping every original data value visible.

What summary values are included in a box-and-whisker plot?

A box-and-whisker plot shows the minimum, lower quartile, median, upper quartile, and maximum.

In a histogram, what does the area of each bar represent?

In a histogram, the area of a bar is proportional to the frequency of that class.

What is the formula for calculating frequency density?

\( \text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}} \)

Why is the mean calculated from a grouped frequency table considered an estimate?

It is an estimate because we use the midpoint of each group, losing the exact values of the individual data points.

How does the mean differ from the median regarding outliers?

The mean is sensitive to outliers and can be pulled away from the center, whereas the median is robust and stays stable.

What is the Inter-Quartile Range (IQR) and what does it measure?

\( \text{IQR} = Q_3 - Q_1 \). It measures the spread of the middle 50% of the data.

Ready to test yourself?

Turn these notes into exam-style practice. Get unlimited AI questions on this topic with instant marking and explanations.

Practice This Topic

More Mathematics A - H230 chapters

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.

Put These Notes into Practice

Reading the notes is a great start. Now practise with unlimited AI-generated questions and get instant feedback. 100,000+ students are already improving their grades.

Start Practising Now View Pricing

Done reading? Test yourself with AI practice questions

Practice This Topic Now