Introduction: The Power of the \(t\)-Distribution

Welcome to one of the most practical chapters in Further Statistics 2! So far in your maths journey, you’ve likely used the Normal Distribution (\(z\)-distribution) to test hypotheses about means. But there is a catch: to use a \(z\)-test, you need to know the population variance (\(\sigma^2\)).

In the real world, we almost never know the true variance of an entire population. Instead, we have to estimate it using our sample. This is where the \(t\)-distribution (also known as Student's \(t\)) saves the day! It allows us to make accurate predictions even when our information is limited. Think of the \(t\)-distribution as the "cautious" version of the Normal distribution—it accounts for the extra uncertainty of not knowing \(\sigma^2\).

1. Testing a Single Mean (\(\mu\))

When we want to test whether a sample comes from a population with a specific mean \(\mu\), but we don't know the population variance \(\sigma^2\), we use the one-sample \(t\)-test.

The Setup

We use the unbiased estimate of the population variance (\(s^2\)) calculated from our sample. Because we are using an estimate, the "shape" of our probability curve changes based on how much data we have. This is measured by Degrees of Freedom (\(v\)).

Quick Review: For a single sample of size \(n\), the degrees of freedom are:
\(v = n - 1\)

The Test Statistic

To see how far our sample mean \(\bar{x}\) is from the hypothesized mean \(\mu\), we calculate the \(t\)-statistic:

\(t = \frac{\bar{x} - \mu}{s / \sqrt{n}}\)

where \(s\) is the standard deviation of the sample. This value is then compared against critical values from the \(t\)-distribution tables using \(n-1\) degrees of freedom.

Memory Aid: "Freedom is losing one." To find your degrees of freedom for one sample, just subtract 1 from your sample size!

Did you know? The \(t\)-distribution was developed by William Sealy Gosset, who worked for the Guinness brewery! He published under the pen name "Student" because his employer didn't want competitors to know they were using statistics to improve beer quality.

Confidence Intervals for the Mean

Instead of just testing a hypothesis, we can estimate where the true population mean lies. The formula for a confidence interval is:

\(\bar{x} \pm t_{v}(\% \text{ level}) \times \frac{s}{\sqrt{n}}\)

Example: If a 95% confidence interval for the weight of a cereal box is \([495g, 505g]\), it means we are 95% confident that the true average weight of all boxes produced falls within this range.

Key Takeaway: Use the \(t\)-distribution when the population variance is unknown and the underlying population is Normally distributed.

2. The Paired \(t\)-test

Sometimes, data comes in pairs. This usually happens in "Before and After" scenarios or "Matched Pairs" (like testing the left foot vs. the right foot of the same person).

How it works

Don't let the two columns of data fool you! In a paired \(t\)-test, we aren't interested in the raw scores. We are only interested in the differences (\(d\)) between the pairs.

Step-by-Step Process:
1. Calculate the difference \(d = x_1 - x_2\) for every pair.
2. Treat these differences as a single sample of data.
3. Test the hypothesis that the mean of these differences is zero (\(H_0: \mu_d = 0\)).
4. Use the same formula as the single mean test: \(t = \frac{\bar{d} - 0}{s_d / \sqrt{n}}\), where \(n\) is the number of pairs.

Common Mistake: Students often use \(2n - 1\) for degrees of freedom here. Remember, because we've turned two lists into one list of differences, the degrees of freedom is just \((\text{number of pairs}) - 1\).

Key Takeaway: Paired tests reduce "background noise" by comparing the same subject to itself, making the test much more sensitive to changes.

3. Comparing Two Independent Means

What if you want to compare two totally different groups? For example, "Do students at School A score higher than students at School B?" This is an independent samples \(t\)-test.

The Condition: Equal Variances

In this specific part of the Edexcel syllabus (Section 7.3), we assume the two populations have equal but unknown variances. Because we assume the variances are the same, we "pool" our sample data together to get a better estimate of that shared variance.

The Pooled Estimate of Variance (\(s^2\))

This is a weighted average of the two sample variances:

\(s^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}\)

Analogy: Imagine two chefs making soup. If Chef A makes a huge pot and Chef B makes a tiny bowl, the big pot's flavor should count for more when you mix them. The pooled variance gives more "weight" to the larger sample.

The Test Statistic

To test if the means are different (\(H_0: \mu_1 = \mu_2\)):

\(t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{s \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\)

The Degrees of Freedom for this test is:
\(v = n_1 + n_2 - 2\)

Quick Review: Why \(n_1 + n_2 - 2\)?
Each sample "loses" one degree of freedom when we calculate its mean. Since we have two samples, we lose two!

Key Takeaway: Use the pooled \(t\)-test only when you are told (or can assume) the variances of the two independent groups are equal.

Summary Checklist for Success

Don't worry if these formulas look intimidating at first. Most of the work is just identifying which "story" the question is telling you. Ask yourself:

1. Do I know the population variance \(\sigma^2\)?
- Yes \(\rightarrow\) Use \(z\)-test (Normal).
- No \(\rightarrow\) Use \(t\)-test.

2. Is there one group or two?
- One group \(\rightarrow\) One-sample \(t\)-test (\(df = n-1\)).
- Two groups (Matched/Before & After) \(\rightarrow\) Paired \(t\)-test on differences (\(df = \text{pairs} - 1\)).
- Two groups (Independent) \(\rightarrow\) Pooled \(t\)-test (\(df = n_1 + n_2 - 2\)).

3. What are my assumptions?
- For any \(t\)-test, the population must be Normally distributed. For independent tests, we also assume equal variances.

Final Tip: When using your calculator, always check if it requires you to input the "Sample Standard Deviation" (\(s_x\)) or the "Population Standard Deviation" (\(\sigma_x\)). For \(t\)-tests, we always use the version calculated with \(n-1\) in the denominator!