Welcome to Unit 3: Collecting Data!

Ever heard the phrase "garbage in, garbage out"? In Statistics, it means that if you collect your data poorly, no amount of fancy math can save your results. This unit is all about how to gather high-quality data so we can trust our conclusions. We will learn how to pick the right people for a survey and how to design experiments that actually prove cause and effect. Don't worry if it feels like a lot of vocabulary at first—we'll break it down piece by piece!

Quick Review: Before we start, remember that a Population is the entire group we want to know about, while a Sample is the smaller group we actually collect data from. We use the sample to make a "best guess" about the population.


3.1 & 3.2: Observational Studies vs. Experiments

The first big question is: Are we just watching, or are we interfering?

Observational Study

In an observational study, researchers observe individuals and measure variables of interest but do not attempt to influence the responses. You are a fly on the wall.
Example: Watching how many students check their phones during lunch.

Experiment

In an experiment, researchers deliberately impose a treatment on individuals to measure their responses. You are actively changing something to see what happens.
Example: Giving one group of students caffeinated coffee and another group decaf to see who stays awake longer.

The Big Difference: Causation

This is a huge point for the AP Exam: Only a well-designed experiment can prove cause-and-effect. Observational studies can only show an association or correlation. Why? Because of confounding variables—outside factors that might be influencing the results (like if the coffee drinkers also got more sleep).

Key Takeaway: If you want to say "A causes B," you must run an experiment!


3.3: Random Sampling – How to Pick Your Sample

If you want your sample to represent the whole population, you have to use randomness. Here are the four methods you need to know:

1. Simple Random Sample (SRS): Every group of size \(n\) has an equal chance of being chosen. Think of this as "putting all the names in a hat and mixing them well."
2. Stratified Random Sample: First, divide the population into groups of similar individuals (called strata). Then, take a separate SRS from each group.
Analogy: If you want to know how students feel about school lunch, you might sample 25 freshmen, 25 sophomores, 25 juniors, and 25 seniors. You are making sure every "layer" (stratum) is represented.
3. Cluster Sample: Divide the population into groups that are located near each other (called clusters). Then, randomly pick entire clusters and survey everyone inside them.
Analogy: If you want to survey a city, you randomly pick 5 apartment buildings and talk to everyone in those buildings.
4. Systematic Random Sample: Pick a starting point at random and then survey every \(n^{th}\) person (e.g., every 10th person walking into a stadium).

Did you know? Stratified sampling is great because it reduces variability. By ensuring each group is represented, your results will likely be closer to the truth every time you repeat the process.

Key Takeaway: Stratified = "Some from all." Cluster = "All from some."


3.4: Potential Problems – What Could Go Wrong? (Bias)

Bias is a systematic error that results in a sample that does not represent the population. If your method is biased, your results will consistently overestimate or underestimate the truth.

Common Types of Bias:

Convenience Sampling: Only picking people who are easy to reach. (e.g., asking people at the mall about their income).
Voluntary Response Bias: People choose themselves to be in the sample. This usually attracts people with very strong (usually negative) opinions.
Undercoverage: Some members of the population are left out of the process of choosing the sample (e.g., a phone survey misses people without phones).
Nonresponse: When a chosen individual can’t be contacted or refuses to participate. This is different from voluntary response because the researcher tried to include them.
Response Bias: When people lie, or the wording of the question influences the answer. (e.g., "Do you agree that our hard-working mayor deserves a raise?")

Common Mistake: Don't just say "bias is bad." On the exam, you must explain how it will affect the result. For example: "Since only people with strong opinions respond to the online poll, the results will likely overestimate the level of dissatisfaction in the population."


3.5 & 3.6: Designing Experiments

To have a "good" experiment, you need the four pillars of experimental design:

1. Comparison: Use two or more treatments to see a difference.
2. Random Assignment: Use chance to assign subjects to treatments. This helps "balance out" the effects of variables we can't control.
3. Control: Keep other variables the same for all groups so they don't affect the outcome.
4. Replication: Use enough subjects in each group so that any differences seen aren't just due to a lucky or unlucky fluke.

Experimental Vocabulary:

Experimental Units: What you are testing (people, plants, etc.). If they are people, we call them subjects.
Explanatory Variable (Factor): What you are manipulating.
Treatments: The specific conditions applied to the units.
Response Variable: What you measure at the end.
Placebo Effect: When a subject reacts to a "fake" treatment simply because they think they are being treated.

Types of Designs:

Completely Randomized Design: All units are assigned to treatments completely by chance.
Randomized Block Design: Units are first sorted into groups (blocks) based on a similar characteristic (like age or gender) that is expected to affect the response. Then, they are randomly assigned to treatments within those blocks. (Blocks are like Strata for experiments!)
Matched Pairs Design: A special type of block design where you compare two very similar units (like twins) or give both treatments to the same person in a random order.

Key Takeaway: We block what we can't control and randomize the rest!


3.7: Inference and Scope – Who Can We Talk About?

This is the final "check" for any study. It depends on how you got your data:

1. Were individuals randomly selected from the population?
- If YES, you can generalize your results to the whole population.
- If NO, you can only talk about the people in your study.

2. Were individuals randomly assigned to groups?
- If YES, you can conclude cause-and-effect.
- If NO, you can only say there is an association.

The "Magic Grid" of Inference:

Random Selection + Random Assignment = Can generalize to population AND show cause-effect.
Random Selection ONLY = Can generalize to population, but NO cause-effect.
Random Assignment ONLY = CAN show cause-effect, but only for the people in the study (usually experiments use volunteers).
Neither = Very limited usefulness!

Quick Review Box:
- Sampling: Picking people (Goal: Represent the population).
- Assignment: Putting people into groups (Goal: Create equal groups to see cause-effect).
- Blinding: When the subject (Single Blind) or both the subject and researcher (Double Blind) don't know which treatment is being given. This prevents bias!

Don't worry if this seems tricky at first! Unit 3 is more about logic and vocabulary than math. Just keep asking yourself: "Was it random?" and "Who is being studied?" and you'll do great!