Welcome to Unit 7: Inference for Quantitative Data: Means!
In the previous unit, we looked at proportions (categorical data like "Yes/No" or "Success/Failure"). Now, we are moving into the world of Quantitative Data—things we measure with numbers, like height, weight, test scores, or how many hours of sleep you get. Our goal in this unit is to estimate or test claims about the average, or Population Mean (\(\mu\)).
Don't worry if this seems a bit math-heavy at first! The logic is very similar to what you learned in Unit 6. The main difference is that we are switching from the letter \(z\) to the letter \(t\). Let's dive in!
7.1 & 7.2: Estimating a Population Mean
When we want to estimate the average of an entire population, we usually can't ask everyone. Instead, we take a sample mean (\(\bar{x}\)) and use it to build a Confidence Interval. This gives us a range of plausible values for the true population mean (\(\mu\)).
The Problem: We don't know \(\sigma\)
In a perfect world, we would know the population standard deviation (\(\sigma\)). But if we don't know the population mean, we almost certainly don't know the population standard deviation! Because we have to use the sample standard deviation (\(s\)) to estimate \(\sigma\), our calculations become a little less certain. To account for this extra "fuzziness," we use the t-distribution instead of the Normal (z) distribution.
Meet the t-Distribution
The \(t\)-distribution looks a lot like the Normal distribution (it's bell-shaped and symmetric), but it has "fiddly" tails. It is shorter and thicker in the tails than the Normal curve.
- Degrees of Freedom (\(df\)): This determines the shape of the \(t\)-curve. For a single sample, \(df = n - 1\).
- Key Concept: As your sample size (\(n\)) gets bigger, the \(t\)-distribution starts looking exactly like the Normal distribution.
Conditions for Inference
Before you calculate anything, you must check these three "Safety Rules":
1. Random: Data must come from a random sample or randomized experiment.
2. 10% Rule: If sampling without replacement, your sample size \(n\) must be less than 10% of the population.
3. Normal/Large Sample: The population must be Normal OR the sample size must be large (\(n \ge 30\)). If \(n < 30\), look at a graph of your data; as long as there is no strong skewness or outliers, you are good to go!
The Formula for a One-Sample t-Interval
\( \bar{x} \pm t^* \left( \frac{s}{\sqrt{n}} \right) \)
Where:
- \(\bar{x}\) is your point estimate (sample mean).
- \(t^*\) is your critical value (found using a table or calculator with \(df = n - 1\)).
- \(\frac{s}{\sqrt{n}}\) is the Standard Error of the mean.
Quick Review: The Margin of Error is the entire part after the \(\pm\) symbol. To make it smaller (more precise), you can increase your sample size or decrease your confidence level.
Key Takeaway: We use \(t\) instead of \(z\) for means because we are estimating the standard deviation from our sample. Always check your conditions first!
7.3: Significance Tests for a Population Mean
If someone makes a claim about a mean (like "The average teen sleeps 8 hours"), we use a Significance Test to see if our sample data provides strong evidence against that claim.
The Four-Step Process
1. State: Define your hypotheses.
- Null Hypothesis (\(H_0\)): \(\mu = \text{claimed value}\)
- Alternative Hypothesis (\(H_a\)): \(\mu <, >, \text{ or } \neq \text{claimed value}\)
Define what \(\mu\) stands for in context!
2. Plan: Name the procedure (One-sample t-test for \(\mu\)) and check conditions (Random, 10%, Normal/Large Sample).
3. Do: Calculate the t-test statistic and the P-value.
\( t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} \)
4. Conclude: If the P-value is less than \(\alpha\) (usually 0.05), reject \(H_0\). There is convincing evidence for \(H_a\). If it's higher, fail to reject \(H_0\).
Did you know? The P-value is the probability of getting a sample mean as extreme as ours if the Null Hypothesis is actually true. It’s the "probability of surprise."
Key Takeaway: A Significance Test tells us if our result is "statistically significant" or if it could have just happened by chance.
7.4: Inference for Paired Data
Sometimes we have two sets of numbers that are linked together—like a "Before" and "After" test score for the same student. This is called Paired Data.
The Trick: Don't treat them as two separate groups! Instead, subtract the pairs to get one list of differences. Then, perform a one-sample \(t\)-test or interval on those differences.
- Our parameter becomes \(\mu_d\) (the mean difference).
- Our null hypothesis is usually \(H_0: \mu_d = 0\).
Key Takeaway: Paired data is just a one-sample test in disguise! Focus only on the column of differences.
7.5 & 7.6: Comparing Two Independent Means
Now we are comparing two separate groups, like "Do boys spend more time on homework than girls?" These are Independent Samples.
The Formula for Two-Sample t-Interval
\( (\bar{x}_1 - \bar{x}_2) \pm t^* \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \)
The Formula for Two-Sample t-Test Statistic
\( t = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \)
Degrees of Freedom for Two Samples
Calculators use a very complex formula for \(df\). If you are doing it by hand, AP Statistics allows a "conservative" approach: use the smaller of \(n_1 - 1\) or \(n_2 - 1\).
Common Mistake to Avoid: When checking the "Normal/Large Sample" condition for two groups, you must check it for BOTH samples. If one sample is large but the other is small and skewed, you cannot proceed!
Key Takeaway: When comparing two groups, we are looking at the difference between their means. The standard error formula gets a bit longer because there is variation in both groups.
Summary Checklist for Unit 7
1. Mean or Proportion? If the data is quantitative (averages), use \(t\). If it's categorical (percentages), use \(z\).
2. One sample or Two? Are you looking at one group, two paired measurements, or two totally independent groups?
3. Conditions: Always check Random, 10%, and Normal/Large Sample (\(n \ge 30\)).
4. Interpretation: Always write your final answer in the context of the problem (e.g., "We are 95% confident that the true mean weight of the apples is between...").
You've got this! Unit 7 is just about applying the same logic you already know to a new type of data. Keep practicing those t-tests!