Introduction: Becoming a Statistical Detective
Welcome! Today, we are diving into the world of Hypothesis Testing. If you’ve ever wondered if a coin is biased, or if a new "lucky" dice actually rolls more sixes, you are already thinking like a statistician.
Hypothesis testing is basically "Statistical Detective Work." We start with an assumption, look at the evidence (the data), and decide if that evidence is strong enough to change our minds. Don't worry if it feels like a lot of new words at first—we will break it down step-by-step!
1. The Language of Hypothesis Testing
To be a good detective, you need to know the lingo. Here are the key terms you’ll need for the OCR MEI syllabus:
The Null Hypothesis (\(H_0\)): This is the "status quo" or the "boring" claim. It assumes that nothing has changed and everything is normal. For a binomial probability, we always write it as:
\(H_0: p = \text{something}\)
The Alternative Hypothesis (\(H_1\)): This is the "exciting" claim. It’s what you are actually trying to find evidence for. It will look like one of these:
\(H_1: p < \text{something}\) (You think the probability has decreased)
\(H_1: p > \text{something}\) (You think the probability has increased)
\(H_1: p \neq \text{something}\) (You just think the probability has changed)
The Test Statistic: This is the number you observe in your experiment. For example, if you flip a coin 20 times, the number of heads you get is your test statistic.
The Significance Level (\(\alpha\)): This is the "threshold" for our evidence. It's usually 5% (\(0.05\)) or 10% (\(0.1\)). It’s the probability we are willing to accept of being wrong when we reject \(H_0\).
Quick Review Box:
- \(H_0\) is always "="
- \(H_1\) is always "<", ">", or "\(\neq\)"
- Hypotheses are always about the population probability \(p\), never about the sample results!
2. One-Tail vs. Two-Tail Tests
How do we know which way to look for evidence? It depends on what we suspect is happening.
One-Tail Tests
Use a 1-tail test if the question specifies a direction.
Example: "A gardener suspects the germination rate of seeds has increased."
Here, \(H_1: p > \text{old value}\). We only care about the "high" end of the results.
Two-Tail Tests
Use a 2-tail test if the question says "the probability has changed" or "is different."
Example: "A scientist wants to check if a machine is still calibrated correctly."
Here, \(H_1: p \neq \text{old value}\). We care about the results being either too high OR too low.
Important Trick: In a 2-tail test, we split the significance level in half. If the total level is 5%, we look for 2.5% at the bottom and 2.5% at the top.
Key Takeaway: Read the question carefully! Words like "increase," "decrease," "better," or "worse" mean 1-tail. Words like "change," "different," or "affected" mean 2-tail.
3. The Critical Region and \(p\)-values
Once we have our data, how do we decide if \(H_0\) is "guilty" (rejected) or "not guilty" (accepted)? There are two main ways to do this.
The Critical Region Method
The Critical Region (or Rejection Region) is the set of values that are so unlikely to happen by chance that we reject \(H_0\) if our test statistic falls inside it.
The Critical Value is the "borderline" number that starts this region.
The \(p\)-value Method
The \(p\)-value is the probability of getting a result at least as extreme as the one we actually observed, assuming \(H_0\) is true.
- If \(p\)-value \(\leq\) Significance Level \(\rightarrow\) Reject \(H_0\) (The result is very rare!)
- If \(p\)-value \( > \) Significance Level \(\rightarrow\) Do not reject \(H_0\) (The result could just be luck.)
Did you know? The significance level is actually the probability of making a "Type I error"—which means rejecting the null hypothesis when it was actually true! We keep this level small to avoid being wrong.
4. Step-by-Step Guide to Conducting a Test
Don't worry if this seems tricky at first; just follow these five steps every time:
Step 1: State your hypotheses. Define \(p\) first (e.g., "Let \(p\) be the probability of a seed germinating"), then write \(H_0\) and \(H_1\).
Step 2: State the distribution. Under the null hypothesis, our variable \(X\) follows a binomial distribution: \(X \sim B(n, p)\).
Step 3: Calculate the probability. Use your calculator's binomial cumulative distribution function (BCD) to find the probability of the result being "at least as extreme."
- For a "greater than" test, find \(P(X \geq x) = 1 - P(X \leq x-1)\).
- For a "less than" test, find \(P(X \leq x)\).
Step 4: Compare. Compare your \(p\)-value to the significance level.
Step 5: Conclude in context. This is where students often lose marks! You must mention the specific situation.
Common Mistake to Avoid: Never say "H0 is definitely true" or "H1 is definitely false." Statistics is about evidence, not absolute certainty. Use non-assertive language like "There is insufficient evidence to suggest..."
5. Real-World Example
The Situation: A coin is flipped 20 times and lands on heads 15 times. Is the coin biased towards heads at a 5% significance level?
1. Hypotheses: \(p\) is the prob of heads. \(H_0: p = 0.5\); \(H_1: p > 0.5\) (1-tail).
2. Distribution: Under \(H_0\), \(X \sim B(20, 0.5)\).
3. Calculation: We observed 15 heads. We need the probability of getting 15 or more.
\(P(X \geq 15) = 1 - P(X \leq 14) \approx 0.0207\).
4. Compare: \(0.0207\) (2.07%) is less than \(0.05\) (5%).
5. Conclusion: 0.0207 is less than 0.05, so we reject \(H_0\). There is sufficient evidence at the 5% level to suggest the coin is biased towards heads.
Key Takeaway: If the probability of your result is smaller than the significance level, it's too weird to be a coincidence. Reject the "boring" \(H_0\)!
Summary of Common Mistakes
- Using the sample proportion: Don't write \(H_0: p = 15/20\). The hypothesis is always about the theoretical probability (like \(0.5\)).
- Wrong direction: In a "greater than" test, make sure you calculate the upper tail (\(P(X \geq x)\)).
- Forgetting the context: Always mention the coins, seeds, or whatever the question is about in your final sentence.
- Two-tail confusion: Remember to compare your \(p\)-value against half the significance level (or double your \(p\)-value) for a 2-tail test.