Welcome to the World of the t-distribution!
In your previous statistics work, you likely used the Normal distribution (the \(z\)-test) to test hypotheses about a population mean. But there was a catch: you had to know the population variance (\(\sigma^2\)).
In the real world, we rarely know the exact variance of an entire population. If we only have a small sample and we don't know the population variance, the Normal distribution isn't quite accurate enough. That’s where the t-distribution (also known as Student’s t-distribution) comes to the rescue! This chapter will show you how to perform hypothesis tests and find confidence intervals when information is limited.
1. Why use the t-distribution?
Think of the \(t\)-distribution as the "cautious cousin" of the Normal distribution. Because we are estimating the variance from a small sample, we are less certain about our results. To account for this uncertainty, the \(t\)-distribution has "fatter tails" than the Normal distribution.
When do we use it?
You use the \(t\)-distribution when:
1. The population is normally distributed (or approximately so).
2. The population variance (\(\sigma^2\)) is unknown.
3. The sample size (\(n\)) is small (usually \(n < 30\)), though it works for larger samples too!
Analogy: Imagine you are trying to guess the average height of students in a school. If you measure 1,000 students, you can be very confident (Normal distribution). If you only measure 5 students, you need to allow for more "error room" in your guess—that extra room is what the fatter tails of the \(t\)-distribution provide.
Quick Review: If you know \(\sigma^2\), use \(z\). If you don't know \(\sigma^2\), use \(t\)!
2. Key Concepts: Degrees of Freedom and Variance
Unbiased Estimate of Variance (\(s^2\))
Since we don't know the population variance (\(\sigma^2\)), we must calculate an unbiased estimate from our sample data. We use the symbol \(s^2\).
The formula is:
\( s^2 = \frac{\sum (x - \bar{x})^2}{n - 1} \) or \( s^2 = \frac{1}{n-1} (\sum x^2 - \frac{(\sum x)^2}{n}) \)
Common Mistake: Don't divide by \(n\)! In the \(t\)-distribution, we always divide by \(n - 1\) to correct for the fact that a small sample tends to underestimate the true population spread.
Degrees of Freedom (\(\nu\))
The shape of the \(t\)-distribution changes depending on how much data you have. This is defined by the degrees of freedom, represented by the Greek letter nu (\(\nu\)).
For a one-sample test:
\( \nu = n - 1 \)
Did you know? As the degrees of freedom increase (as your sample gets bigger), the \(t\)-distribution looks more and more like the standard Normal distribution!
Key Takeaway: Always calculate \(s^2\) and \(\nu = n-1\) before looking at your probability tables.
3. Performing the Hypothesis Test
The goal is to test if the population mean (\(\mu\)) is equal to a specific value (\(\mu_0\)).
Step-by-Step Process:
1. State your Hypotheses:
\(H_0: \mu = \mu_0\)
\(H_1: \mu \neq \mu_0\) (two-tailed) or \(\mu > \mu_0\) / \(\mu < \mu_0\) (one-tailed).
2. Calculate the Test Statistic (\(t\)):
Use the formula:
\( t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \)
Where \(\bar{x}\) is the sample mean, \(\mu\) is the hypothesized mean, and \(s\) is the square root of your unbiased variance estimate.
3. Find the Critical Value:
Look in your statistical tables for the \(t\)-distribution using your degrees of freedom (\(\nu\)) and the significance level (\(\alpha\)).
4. Compare and Conclude:
If your calculated \(t\) is further from zero than the critical value, reject \(H_0\). Otherwise, do not reject \(H_0\).
Example phrase for your conclusion: "There is significant evidence at the 5% level to suggest that the mean weight of the apples has changed."
4. Confidence Intervals for the Mean
Sometimes, instead of testing a specific value, we want to find a range where we are 95% (or 99%) sure the true population mean lies.
The formula for a Symmetric Confidence Interval is:
\( \bar{x} \pm t_{\nu}(\alpha) \times \frac{s}{\sqrt{n}} \)
Breakdown of the formula:
- \(\bar{x}\): Your sample mean (the center of your interval).
- \(t_{\nu}(\alpha)\): The value from the \(t\)-table for \(\nu = n-1\). For a 95% interval, you look for the 2.5% tail (0.025) because 5% is split between both ends.
- \(\frac{s}{\sqrt{n}}\): This is called the Standard Error.
Memory Aid: Think of the Confidence Interval as: Result \(\pm\) (Safety Factor \(\times\) Precision).
5. Summary and Tips for Success
Quick Review Box:
- Standard Deviation: Use \(s\) (the version with \(n-1\)).
- Degrees of Freedom: \(\nu = n-1\).
- Test Statistic: \( t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \).
- Condition: The underlying population must be Normal.
Common Pitfalls to Avoid:
1. The \(n-1\) slip-up: Students often forget to use \(n-1\) when looking up values in the table. Always write down \(\nu = \dots\) first!
2. One-tailed vs. Two-tailed: Read the question carefully. If it says "has the mean changed," it's two-tailed. If it says "is the mean greater than," it's one-tailed.
3. Variance vs. Standard Deviation: Make sure you know if the question gave you \(s^2\) or \(s\). If they gave you the variance, you must square root it for the formula!
Don't worry if this seems tricky at first! The logic is exactly the same as the \(z\)-tests you've done before; we are just using a slightly different table and a slightly different way of calculating the spread. Practice a few questions, and the pattern will become clear!