Welcome to the World of Spread!
In our last lessons, we learned about "averages" (measures of central tendency). But averages only tell half the story. Imagine two cities where the average temperature is 20°C. In City A, it is 20°C every single day. In City B, it is 40°C in the afternoon and 0°C at night! They have the same average, but they feel very different.
This is why we need Measures of Dispersion. They tell us how "spread out" or "consistent" our data is. Let’s dive in!
1. The Range: The Simplest Measure
The Range is the difference between the highest and lowest values in a data set. It gives us a quick idea of the total spread.
How to calculate it:
\( \text{Range} = \text{Highest Value} - \text{Lowest Value} \)
Example: If a student’s test scores are 55, 60, 72, and 90, the range is \( 90 - 55 = 35 \).
Common Mistake: Students often give the range as two numbers (e.g., "The range is 55 to 90"). In Statistics, the range is always one single number (the difference)!
Quick Takeaway: A large range means the data is very spread out; a small range means the data is more consistent.
2. Quartiles and the Interquartile Range (IQR)
Sometimes the Range is misleading because one very high or very low number (an outlier) can make the spread look much bigger than it really is. To fix this, we look at the middle 50% of the data.
What are Quartiles?
If you split your data into four equal parts, the dividing lines are called Quartiles:
• Lower Quartile (\(Q_1\)): The value 25% of the way through the data.
• Median (\(Q_2\)): The middle value (50% of the way).
• Upper Quartile (\(Q_3\)): The value 75% of the way through the data.
The Interquartile Range (IQR)
The IQR measures the spread of the middle 50% of the data. It is great because it ignores outliers.
The Formula:
\( \text{IQR} = Q_3 - Q_1 \)
Step-by-Step: Finding the IQR
1. Put your data in order from smallest to largest.
2. Find the Median (\(Q_2\)).
3. Find the Lower Quartile (\(Q_1\)) by finding the middle of the bottom half.
4. Find the Upper Quartile (\(Q_3\)) by finding the middle of the top half.
5. Subtract \(Q_1\) from \(Q_3\).
Don't worry if the median falls between two numbers! Just take the average of those two numbers, like you learned in the "Averages" chapter.
3. Percentiles and Deciles (Higher Tier)
If quartiles split data into 4 parts, Percentiles split it into 100 parts, and Deciles split it into 10 parts.
• Interpercentile Range: The difference between two specific percentiles (e.g., the 10th to 90th interpercentile range).
• Interdecile Range: The difference between two deciles (usually the 1st and 9th decile).
Why use these? They are even more specific than the IQR. They help us see the spread while still ignoring the very extreme ends of the data.
4. Outliers: The Data "Rebels"
An Outlier is a value that is much higher or much lower than the rest of the data. You might spot them by just looking (inspection), but for your exam, you may need to calculate them.
How to Calculate Outlier Boundaries (Higher Tier)
A value is usually considered an outlier if it is:
• Smaller than: \( Q_1 - (1.5 \times \text{IQR}) \)
• Larger than: \( Q_3 + (1.5 \times \text{IQR}) \)
• Or: Outside 3 standard deviations from the mean (\( \mu \pm 3\sigma \)).
What do we do with Outliers?
When you find an outlier, you should check if it is:
1. An error: Like someone typing "150" instead of "15" for a student's age. These should be corrected or removed.
2. A genuine unusual value: Like a professional athlete's salary in a list of normal office jobs. These should be kept but noted because they affect the mean and range.
5. Standard Deviation (Higher Tier)
Standard Deviation is the most sophisticated measure of spread. It tells us the "average distance" each data point is from the mean.
• Small Standard Deviation: Data points are all very close to the mean (very consistent).
• Large Standard Deviation: Data points are far from the mean (not very consistent).
The Formula
The formula looks scary, but you will be given it in the exam! You just need to know how to use it for a list of data or a frequency table:
\( \sigma = \sqrt{\frac{\sum f(x - \bar{x})^2}{\sum f}} \) or \( \sigma = \sqrt{\frac{\sum fx^2}{\sum f} - (\frac{\sum fx}{\sum f})^2} \)
Memory Aid: Think of Standard Deviation as the "Mean of the Spreads."
6. Standardised Scores (Higher Tier)
Have you ever wanted to compare how well you did in a hard Math test versus an easy English test? You can't just compare the marks. You need Standardised Scores (also called Z-scores).
A standardised score tells you how many standard deviations a value is away from the mean.
The Formula:
\( \text{Standardised Score} = \frac{x - \mu}{\sigma} \)
Where \(x\) is your score, \(\mu\) is the mean, and \(\sigma\) is the standard deviation. A positive score means you are above average; a negative score means you are below average.
7. Comparing Data Sets
In the exam, you will often be asked to compare two sets of data (e.g., "Compare the marks of Class A and Class B"). To do this properly, you must follow this rule:
The Golden Rule of Comparison: Always pair the correct measure of spread with the correct average!
1. If you use the Median, you must use the IQR to describe spread.
2. If you use the Mean, you must use Standard Deviation (or Range) to describe spread.
Example Answer: "Class A had a higher median score (65%) than Class B (58%), showing they performed better on average. However, Class A had a larger IQR (20%) compared to Class B (10%), meaning Class A's results were more spread out and less consistent."
Quick Review Box
Range: High - Low (simplest, affected by outliers).
IQR: \(Q_3 - Q_1\) (middle 50%, ignores outliers).
Standard Deviation: Average distance from the mean (most accurate).
Outlier: A value more than \(1.5 \times \text{IQR}\) away from the quartiles.
Standardised Score: Used to compare different sets of data fairly.