Welcome to the World of the t-distribution!

In your previous statistics work, you likely used the Normal distribution (the \(z\)-test) to test hypotheses about a population mean. But there was a catch: you had to know the population variance (\(\sigma^2\)).

In the real world, we rarely know the exact variance of an entire population. If we only have a small sample and we don't know the population variance, the Normal distribution isn't quite accurate enough. That’s where the t-distribution (also known as Student’s t-distribution) comes to the rescue! This chapter will show you how to perform hypothesis tests and find confidence intervals when information is limited.

1. Why use the t-distribution?

Think of the \(t\)-distribution as the "cautious cousin" of the Normal distribution. Because we are estimating the variance from a small sample, we are less certain about our results. To account for this uncertainty, the \(t\)-distribution has "fatter tails" than the Normal distribution.

When do we use it?
You use the \(t\)-distribution when:
1. The population is normally distributed (or approximately so).
2. The population variance (\(\sigma^2\)) is unknown.
3. The sample size (\(n\)) is small (usually \(n < 30\)), though it works for larger samples too!

Analogy: Imagine you are trying to guess the average height of students in a school. If you measure 1,000 students, you can be very confident (Normal distribution). If you only measure 5 students, you need to allow for more "error room" in your guess—that extra room is what the fatter tails of the \(t\)-distribution provide.

Quick Review: If you know \(\sigma^2\), use \(z\). If you don't know \(\sigma^2\), use \(t\)!

2. Key Concepts: Degrees of Freedom and Variance

Unbiased Estimate of Variance (\(s^2\))

Since we don't know the population variance (\(\sigma^2\)), we must calculate an unbiased estimate from our sample data. We use the symbol \(s^2\).
The formula is:
\( s^2 = \frac{\sum (x - \bar{x})^2}{n - 1} \) or \( s^2 = \frac{1}{n-1} (\sum x^2 - \frac{(\sum x)^2}{n}) \)

Common Mistake: Don't divide by \(n\)! In the \(t\)-distribution, we always divide by \(n - 1\) to correct for the fact that a small sample tends to underestimate the true population spread.

Degrees of Freedom (\(\nu\))

The shape of the \(t\)-distribution changes depending on how much data you have. This is defined by the degrees of freedom, represented by the Greek letter nu (\(\nu\)).
For a one-sample test:
\( \nu = n - 1 \)

Did you know? As the degrees of freedom increase (as your sample gets bigger), the \(t\)-distribution looks more and more like the standard Normal distribution!

Key Takeaway: Always calculate \(s^2\) and \(\nu = n-1\) before looking at your probability tables.

3. Performing the Hypothesis Test

The goal is to test if the population mean (\(\mu\)) is equal to a specific value (\(\mu_0\)).

Step-by-Step Process:

1. State your Hypotheses:
\(H_0: \mu = \mu_0\)
\(H_1: \mu \neq \mu_0\) (two-tailed) or \(\mu > \mu_0\) / \(\mu < \mu_0\) (one-tailed).

2. Calculate the Test Statistic (\(t\)):
Use the formula:
\( t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \)
Where \(\bar{x}\) is the sample mean, \(\mu\) is the hypothesized mean, and \(s\) is the square root of your unbiased variance estimate.

3. Find the Critical Value:
Look in your statistical tables for the \(t\)-distribution using your degrees of freedom (\(\nu\)) and the significance level (\(\alpha\)).

4. Compare and Conclude:
If your calculated \(t\) is further from zero than the critical value, reject \(H_0\). Otherwise, do not reject \(H_0\).

Example phrase for your conclusion: "There is significant evidence at the 5% level to suggest that the mean weight of the apples has changed."

4. Confidence Intervals for the Mean

Sometimes, instead of testing a specific value, we want to find a range where we are 95% (or 99%) sure the true population mean lies.

The formula for a Symmetric Confidence Interval is:
\( \bar{x} \pm t_{\nu}(\alpha) \times \frac{s}{\sqrt{n}} \)

Breakdown of the formula:
- \(\bar{x}\): Your sample mean (the center of your interval).
- \(t_{\nu}(\alpha)\): The value from the \(t\)-table for \(\nu = n-1\). For a 95% interval, you look for the 2.5% tail (0.025) because 5% is split between both ends.
- \(\frac{s}{\sqrt{n}}\): This is called the Standard Error.

Memory Aid: Think of the Confidence Interval as: Result \(\pm\) (Safety Factor \(\times\) Precision).

5. Summary and Tips for Success

Quick Review Box:
- Standard Deviation: Use \(s\) (the version with \(n-1\)).
- Degrees of Freedom: \(\nu = n-1\).
- Test Statistic: \( t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \).
- Condition: The underlying population must be Normal.

Common Pitfalls to Avoid:

1. The \(n-1\) slip-up: Students often forget to use \(n-1\) when looking up values in the table. Always write down \(\nu = \dots\) first!
2. One-tailed vs. Two-tailed: Read the question carefully. If it says "has the mean changed," it's two-tailed. If it says "is the mean greater than," it's one-tailed.
3. Variance vs. Standard Deviation: Make sure you know if the question gave you \(s^2\) or \(s\). If they gave you the variance, you must square root it for the formula!

Don't worry if this seems tricky at first! The logic is exactly the same as the \(z\)-tests you've done before; we are just using a slightly different table and a slightly different way of calculating the spread. Practice a few questions, and the pattern will become clear!