Welcome to the World of Data!

Welcome to your study notes for Unit S1: Statistics 1! In this chapter, "Representation and Summary of Data," we are going to learn how to take a messy pile of numbers and turn them into a clear, meaningful story. Whether it is looking at exam results or sports stats, these tools help us understand what "normal" looks like and how much things vary.

Don't worry if you find some of the formulas a bit intimidating at first. We will break them down step-by-step and use plenty of everyday analogies to make them stick!

1. Measures of Location: Finding the "Center"

When we look at data, we usually want to know where the "middle" is. We use three main tools for this: the Mean, the Median, and the Mode.

The Mean (\(\bar{x}\))

This is what most people call the "average." You add everything up and divide by how many items you have.

Formula for individual data: \( \bar{x} = \frac{\sum x}{n} \)
Formula for frequency tables: \( \bar{x} = \frac{\sum fx}{\sum f} \)

The Median and Mode

The Median is the middle value when numbers are put in order. Think of the "median strip" in the middle of a road! The Mode is the most common value (the one with the highest frequency).

Coding: The Math "Short-cut"

Sometimes data values are huge (like 1,001, 1,005, 1,010). To make math easier, we "code" them by subtracting a constant or dividing.
Important Rule: If you add/subtract a number to all data points, the mean changes by that same amount. If you multiply/divide, the mean also multiplies/divides.

Quick Review:
Mean: The "fair share" average.
Median: The exact middle.
Mode: The most popular choice.

2. Measures of Dispersion: How Spread Out is the Data?

Knowing the middle isn't enough. Imagine a climate where the average temperature is 20°C. That could mean it's 20°C every day, or it could mean it's 50°C in the day and -10°C at night! Dispersion tells us which one it is.

Range and Interquartile Range (IQR)

Range: Largest value minus smallest value. It's simple but easily ruined by one weirdly high or low number.
Interquartile Range (IQR): \( Q_3 - Q_1 \). This looks at the middle 50% of the data, so it ignores the "extremes" at the ends.

Variance and Standard Deviation (\(\sigma\))

These measure how far, on average, each data point is from the mean.
Standard Deviation = \( \sqrt{\text{Variance}} \)
Memory Trick: A low standard deviation means the data is bunched tight together. A high standard deviation means the data is spread wide apart.

Interpolation: Finding Medians in Groups

When data is grouped in a table (like "10-20 mins"), we don't know the exact values. We use Linear Interpolation to guess where the median lies.
Step-by-Step:
1. Find which class the median is in (the \(\frac{n}{2}\) position).
2. See how far into that class you need to go.
3. Use the class width to find the specific value.

Key Takeaway: Dispersion measures "consistency." Small spread = high consistency!

3. Representing Data Visually

Graphs help us see patterns that numbers hide. While your exam won't usually ask you to draw these from scratch, you must know how to interpret them.

Stem and Leaf Diagrams

These show every single piece of data but group them by their "leading" digits. Back-to-back stem and leaf diagrams are great for comparing two groups (like Class A vs Class B).

Box Plots (Box and Whisker)

A box plot uses five key numbers: Minimum, \(Q_1\), Median, \(Q_3\), and Maximum.
• The "box" shows the middle 50% (the IQR).
• The "whiskers" show the range.

Histograms

Crucial Rule: In a histogram, the Area represents the frequency, not the height!
To find the height (Frequency Density), use: \( \text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}} \)

Did you know? Histograms are used for continuous data (things you measure, like height or time) where there are no gaps between the bars.

4. Skewness and Outliers

Sometimes data isn't symmetrical. It might "lean" one way.

Skewness

Positive Skew: The "tail" is on the right. Most data is on the left. (Mean > Median > Mode).
Negative Skew: The "tail" is on the left. Most data is on the right. (Mode > Median > Mean).
Symmetrical: The left and right look like mirror images.

Outliers

An outlier is a "weird" data point that is much higher or lower than the rest.
How to spot them: The exam will give you a rule, usually something like:
Outlier > \( Q_3 + 1.5 \times \text{IQR} \)
OR
Outlier < \( Q_1 - 1.5 \times \text{IQR} \)

Common Mistake to Avoid: Don't just guess if a number is an outlier because it "looks big." Always use the specific math rule provided in the question!

Summary of Section 4:
Skewness describes the shape and "lean" of the data.
Outliers are the extreme values that don't fit the pattern.

Final Encouragement

Statistics is less about memorizing and more about analyzing. When you look at a graph or a mean, always ask yourself: "What is this actually telling me about the real world?" You've got this!