Introduction: Welcome to Further Hypothesis Testing!
Hello! In your previous studies, you learned the basics of hypothesis testing—using data to see if a claim about a population is likely to be true. In this chapter, we are going to take those skills to the next level. We will explore what happens when we make mistakes in our conclusions (Errors), how to handle situations where we don't know the population variance (using the t-distribution), and how to compare two different groups to see if there is a real difference between them.
Hypothesis testing is vital in the real world. From testing if a new medicine works better than an old one to checking if a factory's machine is calibrated correctly, these methods help us make scientific decisions even when we aren't 100% certain. Let's dive in!
1. Type I and Type II Errors
Hypothesis testing isn't perfect. Because we are using a sample to guess the truth about a population, there is always a small chance we might get it wrong. We categorize these mistakes into two types.
What is a Type I Error?
A Type I Error occurs when we reject the null hypothesis (\(H_0\)) even though it is actually true.
Think of this as a "False Alarm."
Analogy: Imagine a smoke alarm goes off when there is no fire. The "Null Hypothesis" was "everything is fine," but the alarm rejected that and screamed "Fire!" when it shouldn't have.
Key Point: The probability of making a Type I error is exactly equal to the significance level (\(\alpha\)) of the test. If you test at the 5% level, you have a 5% chance of a Type I error.
What is a Type II Error?
A Type II Error occurs when we fail to reject the null hypothesis (\(H_0\)) even though it is actually false.
Think of this as "Missing the Truth."
Analogy: Imagine there is a real fire in the kitchen, but the smoke alarm stays silent. It failed to reject the "everything is fine" hypothesis, even though things were definitely not fine!
Quick Review Box:
• Type I Error: Reject \(H_0\) when \(H_0\) is true. (False Alarm)
• Type II Error: Fail to reject \(H_0\) when \(H_1\) is true. (Missed detection)
2. The Student's t-distribution
In your basic Statistics (S1), you likely used the Normal Distribution (\(Z\)) for testing means. However, the Normal Distribution requires you to know the population variance (\(\sigma^2\)). In the real world, we rarely know this!
When to use the t-distribution?
You use the t-distribution when:
1. The population is Normally distributed.
2. The population variance (\(\sigma^2\)) is unknown.
3. The sample size (\(n\)) is small (though it works for large samples too!).
Degrees of Freedom (\(\nu\))
The shape of the t-distribution depends on a value called degrees of freedom, represented by the Greek letter nu (\(\nu\)). For a single sample test:
\( \nu = n - 1 \)
Why \(n-1\)? Imagine you have 3 numbers that must add up to 10. You can pick any two numbers you want (they are "free"), but the 3rd number is then forced to be a specific value to make the sum 10. You have 2 "degrees of freedom."
The Test Statistic
To find our test value (\(t\)), we use this formula:
\( t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \)
Where:
• \( \bar{x} \) is the sample mean.
• \( \mu \) is the population mean from \(H_0\).
• \( s \) is the unbiased estimate of the standard deviation.
• \( n \) is the sample size.
Don't worry if this seems tricky! The main difference from the Normal test is that we use \(s\) instead of \(\sigma\) and we look up our critical values in the t-table using the correct degrees of freedom.
3. Testing the Difference between Two Means
Sometimes we want to compare two different groups. For example: "Do students who study with music score higher than students who study in silence?"
Independent Samples (Normal Distribution)
If we have two independent groups and we know the population variances, we test the difference between the means (\(\mu_1 - \mu_2\)).
The Hypotheses:
\( H_0: \mu_1 = \mu_2 \) (There is no difference)
\( H_1: \mu_1 \neq \mu_2 \) (or \( > \) or \( < \))
Using the Pooled Estimate of Variance (\(s_p^2\))
If we don't know the population variances but we assume they are equal, we combine (pool) the data from both samples to get a better estimate of the variance.
The "Pooled" Formula:
\( s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \)
Once you have \(s_p^2\), your test statistic for the t-distribution is:
\( t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{s_p^2 (\frac{1}{n_1} + \frac{1}{n_2})}} \)
The degrees of freedom for this test is: \( \nu = n_1 + n_2 - 2 \).
Key Takeaway: When comparing two groups with unknown but equal variances, "pool" them together to find a shared variance, then use the t-distribution with \(n_1 + n_2 - 2\) degrees of freedom.
4. Common Mistakes to Avoid
1. Confusion between \(z\) and \(t\): Always check if you know the population variance (\(\sigma^2\)). If you only have the sample variance (\(s^2\)), use the t-distribution!
2. Wrong Degrees of Freedom: For one sample, it's \(n-1\). For two samples (pooled), it's \(n_1 + n_2 - 2\). Double-check this before looking at the tables.
3. Significance Level: Read the question carefully to see if it's a one-tailed or two-tailed test. In a two-tailed test, you must split the significance level (e.g., 5% becomes 2.5% at each end).
Summary Checklist
Before you sit your exam, make sure you can:
• Define Type I and Type II errors in words and identify them in scenarios.
• State the conditions for using a t-distribution.
• Calculate the unbiased estimate of variance \(s^2\).
• Calculate a pooled variance \(s_p^2\) for two-sample tests.
• Use the t-tables correctly to find critical values using degrees of freedom (\(\nu\)).
• Conclude your test by comparing your test statistic to the critical value and writing a sentence in the context of the problem.
You've got this! Statistics is just about telling a story with numbers. Keep practicing those calculations, and the patterns will become clear!