Welcome to Analysing Data!
Ever wondered how companies decide which video games to make, or how your teachers predict your final grades? It all comes down to analysing data. In this chapter, we’re going to be "data detectives." We’ll learn how to take a big pile of messy numbers and turn them into clear stories that help us make decisions. Don’t worry if you find big groups of numbers a bit scary—we’re going to break them down into simple, easy-to-follow steps.
1. The "Big Four" Summary Statistics
When we have a list of data (called ungrouped data), we use four main tools to understand what’s "normal" and how spread out the numbers are. These are the Mean, Median, Mode, and Range.
The Averages (Measures of Central Tendency)
- Mean: The "fair share" average. You add all the numbers together and divide by how many numbers there are.
Example: For the numbers 3, 5, and 10: \( \frac{3 + 5 + 10}{3} = \frac{18}{3} = 6 \). - Median: The middle value. You must put the numbers in order from smallest to largest first!
Analogy: Think of the "median" strip in the middle of a motorway—it's right in the center. - Mode: The most frequent value.
Memory Aid: MOde = MOst often.
The Spread
- Range: This tells us how "consistent" the data is. It is the Largest Value - Smallest Value.
Tip: A small range means the data is very similar (consistent). A large range means the data is very spread out.
Quick Review Box:
1. Mean: Add and divide.
2. Median: Middle (after ordering!).
3. Mode: Most common.
4. Range: Big minus small.
Common Mistake to Avoid: Many students forget to re-order the list of numbers before finding the Median. If you don't put them in order, your middle number will be wrong!
2. Dealing with Grouped Data
Sometimes, we have too much data to list every single number. Instead, we put them into groups (classes). For example, "0 to 10 minutes," "11 to 20 minutes," etc.
Why do we "Estimate"?
When data is grouped, we don't know the exact original values anymore. We only know how many people fall into a certain range. Because of this, we can only calculate an estimate of the mean, not the exact answer.
How to Estimate the Mean:
- Find the midpoint of each group (the number exactly in the middle).
- Multiply the midpoint by the frequency (how many people/items are in that group).
- Add up all these totals.
- Divide by the total frequency (the total number of people/items).
The Modal Class
Instead of a single "Mode," we look for the Modal Class. This is simply the group that has the highest frequency. It’s the "most popular" category.
Key Takeaway: For grouped data, always use the midpoint to represent the group when calculating the mean.
3. Comparing Data Sets
Exam questions often ask you to compare two sets of data (like "Class A" vs. "Class B"). To get full marks, you must compare two things:
- An Average: Use the Mean or Median to say who did "better" or had "higher" scores.
- A Measure of Spread: Use the Range (or Interquartile Range for Higher Tier) to say who was more "consistent."
Example Sentence: "On average, Class A scored higher because their mean was 75 compared to 62, but Class B was more consistent because their range was only 10 compared to 25."
4. Higher Tier: Quartiles and Box Plots
If you are studying the Higher Tier, you need to go beyond the Range and look at Quartiles. Quartiles split the data into quarters (25% chunks).
- Lower Quartile (LQ): The value 25% of the way through the data.
- Upper Quartile (UQ): The value 75% of the way through the data.
- Interquartile Range (IQR): \( UQ - LQ \). This tells you the spread of the middle 50% of the data. It's better than the range because it isn't affected by "outliers" (weirdly high or low numbers).
Box Plots (Box-and-Whisker Diagrams)
A Box Plot is a visual way to show the "Big Five" summary statistics:
- Minimum value (the end of the left whisker)
- Lower Quartile (left side of the box)
- Median (line inside the box)
- Upper Quartile (right side of the box)
- Maximum value (the end of the right whisker)
Did you know? Box plots are amazing for comparing two distributions instantly. If one "box" is further to the right, that group generally has higher values!
5. Bivariate Data: Scatter Diagrams
Bivariate data just means we are looking at two different variables at the same time to see if they are linked. For example, "Temperature" and "Ice Cream Sales."
Correlation
Correlation is the word we use to describe the relationship between the two variables:
- Positive Correlation: As one goes up, the other goes up (the points trend upwards to the right).
- Negative Correlation: As one goes up, the other goes down (the points trend downwards to the right).
- No Correlation: The points are scattered everywhere with no pattern.
The Line of Best Fit
This is a straight line drawn through the "middle" of the points. You should try to have roughly the same number of points above the line as below it. We use this line to make predictions.
- Interpolation: Predicting a value inside the range of data we already have. This is usually quite reliable.
- Extrapolation: Predicting a value outside the range of our data. Be careful! This is often unreliable because the trend might not continue forever.
Key Concept: Correlation vs. Causation
Just because two things are linked (correlation), it doesn't mean one causes the other.
Example: Sunglasses sales and ice cream sales are correlated, but wearing sunglasses doesn't cause you to want ice cream—it's the sun causing both!
6. Outliers and Misleading Data
Sometimes, data includes outliers. These are values that don't fit the pattern of the rest of the data. They might be caused by a mistake in measurement or just a very unusual event.
How Graphs Mislead Us
Statistics can be used to trick people! Always check:
- The Scale: Does the Y-axis start at 0? If it starts at a high number, small differences can look huge.
- Labels: Are the axes clearly labeled with units?
- Picture Graphs: Are the pictures drawn to scale? (e.g., doubling the height of a picture actually makes it four times the area!).
Key Takeaway: Always look at the numbers on the axes, not just the "shape" of the bars or lines!
Final Encouragement
Statistics is all about telling a story with numbers. If you remember to order your data, use your midpoints for groups, and always compare both the average and the spread, you'll be well on your way to mastering this chapter. Keep practicing those mean calculations—you've got this!