Welcome to Hypothesis Testing for 1 and 2 Samples!

In your previous studies, you probably looked at hypothesis tests where everything was "perfect"—you knew the population variance, or you had massive sample sizes. In the real world, we rarely have all the facts. This chapter is all about being a statistical detective when information is missing or when you want to compare two different groups (like seeing if a new drug works better than an old one).

Don't worry if this seems a bit more complex than basic testing. We will break it down step-by-step, using the same logic you already know: Null Hypothesis, Test Statistic, and Conclusion.

1. Testing a Single Mean: The t-distribution

(Syllabus 16.1)

In earlier chapters, you used the Normal (\(z\)) distribution. But what happens if you don't know the population variance (\(\sigma^2\)) and your sample is small? This is where the t-distribution comes to the rescue.

When to use the t-test:

You use a one-sample t-test when:
1. The population variance is unknown.
2. The sample size is small (usually \(n < 30\)).
3. The underlying population is normally distributed (this is a vital assumption!).

The "Safety Margin" Analogy

Think of the \(t\)-distribution as a "cautious" version of the Normal distribution. Because we are estimating the variance from a small sample, we aren't 100% sure about it. The \(t\)-distribution has "fatter tails" than the Normal distribution, making it harder to reject the null hypothesis unless the evidence is really strong. As your sample size (\(n\)) gets bigger, the \(t\)-distribution starts looking exactly like the Normal distribution!

Key Formula:

The test statistic is calculated as:
\(t = \frac{\bar{x} - \mu}{s / \sqrt{n}}\)
Where \(s\) is your sample standard deviation. You also need to know the degrees of freedom (\(v\)), which is simply \(v = n - 1\).

Quick Review Box:
Small sample + Unknown variance + Normal population = Use t-test!

2. Comparing Two Means (Independent Samples)

(Syllabus 16.2 & 16.3)

Sometimes we want to know if there is a difference between two groups. For example: "Do students at School A get higher marks than students at School B?"

Scenario A: Known Variances (The z-test)

If you actually know the variances of both populations (\(\sigma_1^2\) and \(\sigma_2^2\)), you use the \(z\)-distribution.
The Test Statistic:
\(z = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\)
Usually, our null hypothesis is that there is no difference, so \((\mu_1 - \mu_2) = 0\).

Scenario B: Unknown but Equal Variances (The Pooled t-test)

This is much more common. If we don't know the variances but we assume they are the same for both groups, we "pool" our data together to get a better estimate of that shared variance.

Did you know? This test is only valid if both populations are normal and have equal variances. You don't need to test for equal variances in the exam, but you must state it as an assumption!

The "Pooling" Step:

First, calculate the pooled estimate of variance (\(s_p^2\)):
\(s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}\)

Then, find your test statistic:
\(t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\)
For this test, the degrees of freedom are \(v = n_1 + n_2 - 2\).

Memory Aid: Think of "Pooling" like a potluck dinner. Instead of everyone eating their own food (separate variances), you put it all in one big bowl (pooled variance) to share!

Key Takeaway: If you see "assume equal variances" in a question about two means, look for the pooled variance formula in your booklet.

3. Difference Between Two Binomial Proportions

(Syllabus 16.4)

What if the data isn't a "mean" but a "proportion"? For example: "Is the proportion of people who support a law in Town A different from Town B?"

The Method:

Just like the means, we use a pooled estimate for the proportion (\(p\)) if we assume the proportions are equal under the null hypothesis.
Pooled Proportion (\(\hat{p}\)):
\(\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}\) (Total successes divided by total trials)

The Test Statistic:

\(z = \frac{p_1 - p_2}{\sqrt{\hat{p}(1 - \hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}}\)

Common Mistake to Avoid: Don't forget that this is a \(z\)-test, not a \(t\)-test! We use the Normal approximation for proportions.

4. Interpreting the Results in Context

(Syllabus 16.5)

The most important part of Statistics is explaining what your numbers actually mean. If you do all the math correctly but don't explain the answer, you'll lose easy marks!

How to write your conclusion:

1. Compare: State if your test statistic is in the critical region (e.g., "Since 2.45 > 1.96...") or if your p-value is less than the significance level.
2. Decision: Say whether you "Reject \(H_0\)" or "Fail to reject \(H_0\)".
3. Context: Use the words from the question. Avoid being "definite".
Instead of: "This proves the drug works."
Use: "There is significant evidence at the 5% level to suggest that the mean recovery time has decreased."

Key Point: Statistics is about evidence, not proof. Always use phrases like "suggests that" or "evidence to indicate".

Summary Checklist

- Single Mean (Small sample, \(\sigma\) unknown): Use \(t\)-test with \(v = n - 1\).
- Two Means (Known \(\sigma\)): Use \(z\)-test.
- Two Means (Unknown but equal \(\sigma\)): Use Pooled \(t\)-test with \(v = n_1 + n_2 - 2\).
- Two Proportions: Use Pooled \(z\)-test.
- Validity: Always check if the population is Normal for \(t\)-tests!