Introduction to hypothesis testing - Statistics (9ST0) - Pearson Edexcel A Level

Welcome to Statistical Inference!

Ever wondered how scientists "prove" a new medicine works, or how a factory knows if its machines are calibrated correctly? They don't just guess; they use Hypothesis Testing. Think of a hypothesis test like a courtroom trial: we assume someone is innocent until we have enough evidence to prove they are guilty. In statistics, we assume the "status quo" is true until our data proves otherwise.
Don't worry if this seems a bit abstract at first—we’re going to break it down into simple, logical steps!

1. The Basics: Parameters vs. Statistics

Before we start testing, we need to know what we are talking about. In statistics, we distinguish between the "whole group" and the "small group" we actually measure.

Parameter: A numerical property of a population. It is the "true" value we usually don't know.
Example: The average height of every adult in the UK (\(\mu\)).

Statistic: A numerical property of a sample. It is a value we calculate from the data we have collected.
Example: The average height of 100 people you measured (\(\bar{x}\)).

Memory Aid:
Parameter = Population
Statistic = Sample

Standard Error: This is just a fancy name for the standard deviation of a statistic. It tells us how much we expect our sample results to vary from the true population value.

Key Takeaway: We use Statistics (from our sample) to make an "inference" (a calculated guess) about the Parameter (the population).

2. The Language of Hypothesis Testing

To pass your exam, you need to speak the language! Here are the core terms you'll use in every question:

The Hypotheses

Null Hypothesis (\(H_0\)): This is the "boring" version. It assumes nothing has changed or there is no effect. We always write this with an equals sign (e.g., \(H_0: p = 0.5\)).

Alternative Hypothesis (\(H_1\)): This is what you are trying to find evidence for. It’s the "exciting" version (e.g., \(H_1: p > 0.5\)).

The Decision Makers

Significance Level (\(\alpha\)): This is the "threshold" for evidence. Usually, it is 5% (\(0.05\)). If the probability of our result happening by chance is less than this level, we reject \(H_0\).

Test Statistic: The value calculated from your sample data (like a z-score or the number of successes in a Binomial trial) that you use to make your decision.

p-value: The probability of getting a result as extreme as ours, assuming \(H_0\) is true.
Mnemonic: If the p is low, the null must go! (If p < significance level, reject \(H_0\)).

The "Regions"

Critical Region: The "rejection zone." If your test statistic falls in here, you reject \(H_0\).
Critical Value: The "borderline" number that starts the critical region.
Acceptance Region: The "safe zone." If your statistic falls here, you don't have enough evidence to change your mind, so you stick with \(H_0\).

Quick Review:
1. State \(H_0\) and \(H_1\).
2. Pick a significance level (usually 5%).
3. Calculate your test statistic.
4. Check if it's in the Critical Region.

3. One-Tail vs. Two-Tail Tests

How do you know which way to look? It depends on what the question asks.

1-Tail Test: You are looking for a change in one specific direction.
Keywords: "increased", "decreased", "better", "slower".
Example: \(H_1: \mu > 100\)

2-Tail Test: You are looking for any change, regardless of direction.
Keywords: "has changed", "is different", "is not equal to".
Example: \(H_1: \mu \neq 100\)

Common Mistake: In a 2-tail test, you must split your significance level in half! For a 5% test, you look for 2.5% at the top end and 2.5% at the bottom end.

4. Testing a Proportion (Binomial Distribution)

We use this when we are counting "successes" vs "failures."
\(X \sim B(n, p)\)

Steps:
1. Define the parameter: Let \(p\) be the probability of...
2. Write \(H_0: p = \text{old value}\) and \(H_1: p <, > \text{or} \neq \text{old value}\).
3. Assuming \(H_0\) is true, find the probability of getting a result as extreme as the one observed.
4. Compare this probability to the significance level.

Did you know? If your sample size (\(n\)) is very large, you can use a Normal Approximation to solve Binomial problems. This makes the math much faster for big numbers!

5. Testing a Mean (Normal Distribution)

We use this when we are looking at the average of a sample.
If the population \(X \sim N(\mu, \sigma^2)\), then the sample mean \(\bar{X}\) follows:
\(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\)

Important: As the sample size (\(n\)) gets bigger, the "spread" (\(\frac{\sigma^2}{n}\)) of the sample mean gets smaller. This is why larger samples give more reliable results!

The "Large Sample" Rule: If your sample is large (usually \(n \geq 30\)), the Central Limit Theorem allows us to assume the sample mean follows a Normal distribution, even if the original population wasn't normal.

6. Interpreting the Results

When you finish a test, your conclusion must be cautious and in context.

Good phrasing: "There is sufficient evidence at the 5% level to suggest that the mean height has increased."
Bad phrasing: "This proves the mean height is definitely 180cm."

Why? Because we are only using a sample. There is always a tiny chance we got a weird sample by luck! This is why we never say we have "proven" it—we just have "evidence."

7. Summary Checklist for Success

- Random Sampling: Ensure the sample is random; otherwise, the test is biased.
- State Assumptions: Mention if you are assuming a Normal distribution or using the Central Limit Theorem.
- Check your tails: Is it \(>\) or \(\neq\)?
- Context is King: Always write your final sentence about the actual thing in the question (e.g., seeds, lightbulbs, test scores), not just about "p-values."

Don't worry if this seems tricky at first! Hypothesis testing is a process. Once you've practiced the steps for a few different scenarios, you'll start to see that the "logic" is the same every time.

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.