Welcome to the World of Data Representation!

Ever wondered how companies like Netflix or Spotify understand your habits? They use data! In this chapter of your Cambridge International AS Level Mathematics (9709) course, we are going to learn how to take a messy pile of numbers and turn them into clear, beautiful pictures. This isn't just about drawing; it's about making data "talk" so we can understand what it's trying to tell us. Don't worry if you find numbers a bit overwhelming—we’ll take it one step at a time!

1. Stem-and-Leaf Diagrams

Imagine you have the test scores of 20 students. Listing them randomly is confusing. A stem-and-leaf diagram organizes them while keeping the original numbers visible.

How it works:

The "stem" represents the first digit(s), and the "leaf" represents the last digit.
Example: The number 45 would have a stem of 4 and a leaf of 5.

Back-to-Back Stem-and-Leaf Diagrams:

When you want to compare two groups (like Class A vs. Class B), you put the "stem" in the middle and the "leaves" for one group on the left and the other on the right.

Key Rule: Always include a Key! Without a key (e.g., \(4|5\) means 45), your diagram is just a bunch of mystery numbers.

Quick Takeaway: Stem-and-leaf diagrams are great because they show every single piece of data and make it easy to find the median and mode.

2. Box-and-Whisker Plots

If you want a "summary" of your data rather than seeing every single point, use a box-and-whisker plot. This diagram splits your data into four equal parts (quartiles).

The "Five-Number Summary":

To draw one, you need five things:
1. Lowest Value (The end of the left whisker)
2. Lower Quartile (\(Q_1\)) (The left side of the box)
3. Median (\(Q_2\)) (The line inside the box)
4. Upper Quartile (\(Q_3\)) (The right side of the box)
5. Highest Value (The end of the right whisker)

Memory Aid: Think of the "box" as the "middle 50%" of your data. The "whiskers" show how far the extremes stretch out.

3. Histograms

Histograms look like bar charts, but they are different! In a bar chart, the height matters. In a histogram, the area of the bar represents the frequency.

Important Point: We use histograms for continuous data (like height, weight, or time) where data is grouped into classes.

The Frequency Density Secret:

If the widths of your groups (class widths) are different, you cannot just plot frequency on the vertical axis. You must calculate Frequency Density (FD):
\(FD = \frac{\text{Frequency}}{\text{Class Width}}\)

Step-by-Step for Histograms:
1. Check if the classes have gaps (e.g., 10-14, 15-19). If they do, use boundaries (9.5-14.5, 14.5-19.5).
2. Calculate the Class Width for each group.
3. Calculate Frequency Density for each group.
4. Plot the FD on the y-axis and the data intervals on the x-axis.

Analogy: Think of Frequency Density like "crowdedness." If you have 10 people in a tiny room, it’s very dense. If you have 10 people in a giant hall, it’s not dense at all.

4. Cumulative Frequency Graphs

This is a "running total" graph. It always goes up and usually forms a smooth 'S' shape.

How to use it:

1. Find the Median: Go to the halfway point on the y-axis (total frequency / 2), move across to the curve, and look down.
2. Find Quartiles: \(Q_1\) is at 25% of the total frequency; \(Q_3\) is at 75%.
3. Percentiles: You can find any percentage (like the 90th percentile) the same way.

Common Mistake: Always plot the cumulative frequency against the upper class boundary of each group, not the middle!

5. Measures of Central Tendency (The "Middle")

These tell us where the "center" of the data is.

  • Mean (\(\bar{x}\)): The average. Add them all up and divide by how many there are. \(\bar{x} = \frac{\sum x}{n}\)
  • Median: The middle value when data is in order.
  • Mode: The most frequent value.

Did you know? The mean is sensitive to "outliers" (extreme values). If Bill Gates walks into a room of students, the "mean" wealth of the room skyrockets, but the "median" wealth stays the same!

6. Measures of Variation (The "Spread")

Knowing the middle isn't enough; we need to know how spread out the data is.

  • Range: Highest value minus Lowest value. (Simple, but affected by outliers).
  • Interquartile Range (IQR): \(Q_3 - Q_1\). This tells you the spread of the middle 50% and ignores weird extremes.
  • Standard Deviation (\(\sigma\)): This is the "gold standard" for spread. It tells us the average distance of each data point from the mean.

The Math bit (Don't panic!):

Standard Deviation formula:
\(\sigma = \sqrt{\frac{\sum x^2}{n} - (\frac{\sum x}{n})^2}\)
Or, using the mean: \(\sigma = \sqrt{\frac{\sum x^2}{n} - \bar{x}^2}\)

Quick Review Box:
- Large SD = Data is very spread out.
- Small SD = Data is consistent and close to the mean.

7. Coded Data

Sometimes the numbers are huge (like 1001, 1005, 1008). To make it easier, we "code" them by subtracting a constant (e.g., subtract 1000 to get 1, 5, 8).

The Tricks:

1. If you add/subtract a number from every value: The Mean changes, but the Standard Deviation stays exactly the same!
Analogy: If everyone in class stands on a 10cm box, the average height goes up by 10cm, but the difference between the tallest and shortest person is still the same.
2. If you multiply/divide every value: Both the Mean and the Standard Deviation are multiplied/divided by that number.

Key Takeaway: Coding is just a shortcut to make calculations easier without changing the "shape" of the data spread.

Final Tips for Success

- Read the scale: On histograms and cumulative frequency graphs, examiners love to use tricky scales. Check how much one small square is worth!
- Label everything: Axes, units, and keys are easy marks that students often lose.
- Grouped Data: When calculating the mean for grouped data, use the midpoint of each group as your \(x\) value.

You've got this! Data representation is all about patterns. Keep practicing the drawings, and the interpretations will become second nature.