Welcome to Unit 1: Exploring One-Variable Data!

Welcome to the beginning of your AP Statistics journey! Statistics is the science of learning from data. In this first unit, we focus on One-Variable Data. This simply means we are looking at one thing at a time—like the height of students in your class, the color of cars in a parking lot, or how many hours of sleep you got last night. Our goal is to take a messy pile of numbers and turn them into a clear story. Don't worry if you aren't a "math person"—Unit 1 is all about patterns, pictures, and common sense!

1.1 Identifying Variables

Before we can analyze data, we need to know what kind of data we have. Data comes from individuals (the people, animals, or things we are studying) and is organized into variables (the characteristics we are measuring).

There are two main types of variables:

1. Categorical Variables: These place individuals into groups or categories. Think of these as "labels."
Examples: Hair color, zip code (even though it's a number, it's a label!), favorite music genre.

2. Quantitative Variables: These are numerical values for which it makes sense to find an average.
Examples: Height, GPA, temperature, age.

Quick Review: If you can't calculate a meaningful average (like the "average zip code"), it is likely categorical!

1.2 Representing Categorical Data

When we want to show categorical data, we use Frequency Tables or Relative Frequency Tables.

- Frequency: The count of how many individuals fall into a category.
- Relative Frequency: The percentage or proportion (Frequency divided by the total).

To visualize this, we use Bar Charts or Pie Charts.
Important Tip: Always look at the Area Principle. The bars in a chart should have the same width. If a bar is twice as wide as another, it might trick your brain into thinking the data is larger than it really is!

1.3 Representing Quantitative Data: The Visuals

Quantitative data is all about numbers, so we use different types of graphs to see the patterns:

Dotplots: Each data point is shown as a dot above a number line. This is great for small datasets.
Stemplots (Stem-and-Leaf): These keep the actual digits of the data. The "stem" is the first digit(s), and the "leaf" is the last digit.
Histograms: These look like bar charts, but the bars touch. Each bar represents a range of values (called a "bin" or "class").

Did you know? On a histogram, if a data point falls exactly on the boundary between two bars, it usually goes into the bar on the right!

1.4 Describing the Distribution (The "SOCS" Mnemonic)

When you are asked to "describe the distribution" on the AP exam, you must address these four things. Think of the word SOCS to help you remember:

1. Shape: Is it Symmetric (looks the same on both sides)? Is it Skewed Right (the "tail" of the graph points to the right/higher numbers)? Is it Skewed Left (the "tail" points to the left/lower numbers)?
2. Outliers: Are there any data points that look like they don't belong with the rest?
3. Center: Where is the middle of the data? (Usually the Mean or Median).
4. Spread: How much does the data vary? (The Range, Standard Deviation, or IQR).

Key Takeaway: Always describe SOCS in context. Don't just say "The mean is 10." Say "The mean number of apples eaten is 10."

1.5 Measuring Center: Mean vs. Median

How do we find the "middle"?

- Mean (\( \bar{x} \)): The average. Add them all up and divide by the total number of points.
- Median: The exact middle value when data is lined up from smallest to largest.

Which one should you use?
The Mean is very sensitive to outliers. If you have one huge number, the mean will get pulled toward it. This makes the Mean not resistant.
The Median is resistant. It doesn't care if the highest number is 100 or 1,000,000; the middle stays the middle!

Quick Trick: In a Skewed Right distribution, the Mean is usually greater than the Median because the "tail" pulls the mean up!

1.6 Measuring Spread: Range, IQR, and Standard Deviation

Spread tells us how consistent or varied our data is.

1. Range: \( Max - Min \). (Note: In Stats, the range is a single number, not "10 to 50").
2. Interquartile Range (IQR): The distance between the 1st quartile (25th percentile) and the 3rd quartile (75th percentile). \( IQR = Q_3 - Q_1 \). This tells you how spread out the middle 50% of the data is.
3. Standard Deviation (\( s_x \)): The average distance of the data points from the mean. If the standard deviation is 0, all the data points are exactly the same!

1.7 The 1.5 x IQR Rule for Outliers

Don't worry if this math seems tricky—it's just a simple "fence" we build to catch outliers. A data point is a formal outlier if it is:

- Smaller than \( Q_1 - (1.5 \times IQR) \)
- Larger than \( Q_3 + (1.5 \times IQR) \)

Example: If \( Q_1 = 10, Q_3 = 20 \), then \( IQR = 10 \).
The "fence" is \( 1.5 \times 10 = 15 \).
Lower fence: \( 10 - 15 = -5 \). Upper fence: \( 20 + 15 = 35 \).
Any number below -5 or above 35 is an outlier!

1.8 The Five-Number Summary and Boxplots

A Boxplot (or Box-and-Whisker Plot) is a visual version of the Five-Number Summary:

1. Minimum
2. \( Q_1 \) (First Quartile)
3. Median
4. \( Q_3 \) (Third Quartile)
5. Maximum

Important: Boxplots are great for comparing two different groups, but they don't show individual data points or "peaks" in the data (modality) like histograms do.

Summary: Putting it all together

In Unit 1, you learned how to take a single variable, graph it, and describe its personality using SOCS. Remember: Categorical data uses bar charts; Quantitative data uses histograms or boxplots. When the data is Skewed, use the Median and IQR to describe it. When the data is Symmetric, use the Mean and Standard Deviation. You've got this!