Introduction to Paired Tests

Welcome to the world of paired tests! If you’ve ever wondered if a new energy drink actually makes people run faster, or if a revision course really improves student grades, you’re in the right place. These tests are part of Paper 2: Statistical Inference.

In Statistics, we often want to compare two sets of data. Sometimes these sets are totally independent (like comparing heights of students in London vs. heights of students in Manchester). But often, the data comes in pairs. This happens when we measure the same person twice (e.g., "Before" and "After") or use "matched pairs" (like identical twins).

Why do we use paired tests? They are incredibly powerful because they "cancel out" individual differences. If we test a drug on you, we compare your results to your own baseline, rather than comparing you to a stranger who might have a completely different metabolism!


1. The Core Concept: Working with Differences

The "secret sauce" of every paired test is that we don't actually look at the two raw datasets separately. Instead, we create a new single dataset called the Differences (\(d\)).

Example: If a student scored 60 before a course and 65 after, their difference is \(65 - 60 = 5\). We do this for every pair in the study and then perform our test on those numbers.

Quick Review:
1. Start with two related columns of data.
2. Calculate the difference for each pair: \(d = x_1 - x_2\).
3. Perform the hypothesis test on these differences (\(d\)).


2. The Sign Test (Non-Parametric)

The Sign Test is the simplest paired test. It doesn’t care how much someone improved; it only cares about the direction of the change.

When to use it: Use this when you have paired data but you don't want to make any assumptions about the shape of the distribution. It's great for small samples or "messy" data.

How it works (Step-by-Step):

1. Calculate the difference for each pair.
2. Assign a plus (+) if the difference is positive and a minus (-) if it's negative.
3. Ignore any pairs where the difference is zero (they are removed from the sample size \(n\)).
4. Your test statistic is usually the number of times the less frequent sign occurs.
5. Compare this to critical values from a Binomial Distribution table, where \(n\) is your number of non-zero pairs and \(p = 0.5\).

Analogy: Imagine a "Thumbs Up / Thumbs Down" system. We don't care if someone liked the movie a little or a lot; we just count how many people gave it a "Thumbs Up."

Key Takeaway: The Sign Test is "weak" (less likely to find a significant result) because it throws away information about the size of the difference, but it is very robust because it works on any data.


3. Wilcoxon Signed-Rank Test (Non-Parametric)

The Wilcoxon Signed-Rank Test is a step up from the Sign Test. It looks at the direction (+) or (-) AND the relative size of the differences by ranking them.

The Assumption: For this test to be valid, you must assume that the distribution of differences is symmetrical.

How it works (Step-by-Step):

1. Calculate the differences for all pairs.
2. Ignore the signs for a moment and rank the differences from smallest to largest (1 is the smallest difference).
3. If differences are tied, give them the average of the ranks they would have taken.
4. Now, put the signs back on the ranks (e.g., if a difference of +5 was rank 3, it stays 3; if -5 was rank 3, it becomes a "negative rank").
5. Sum the positive ranks (\(W_+\)) and the negative ranks (\(W_-\)).
6. Your test statistic \(T\) is the smaller of these two sums.

Common Mistake: Students often forget to ignore the zero differences. If the "before" and "after" are the same, that pair must be removed before you start ranking!

Did you know? This test is much more powerful than the Sign Test because it recognizes that a massive improvement should count for more than a tiny one.


4. Paired t-test (Parametric)

The Paired t-test is the "gold standard" when your data is "well-behaved." It uses the actual values of the differences, not just ranks.

The Assumption: This test is only valid if the population of differences is Normally Distributed: \(d \sim N(\mu_d, \sigma_d^2)\).

The Hypothesis:

Usually, we are testing if there is any change at all.
\(H_0: \mu_d = 0\) (The mean difference is zero)
\(H_1: \mu_d \neq 0\) (Two-tailed) or \(\mu_d > 0 / \mu_d < 0\) (One-tailed)

The Formula:

The test statistic is calculated as: \(t = \frac{\bar{d} - \mu_d}{s_d / \sqrt{n}}\)
Where:
- \(\bar{d}\) is the mean of your sample differences.
- \(s_d\) is the standard deviation of those differences.
- \(n\) is the number of pairs.

Memory Aid: Think of the "t" in t-test as standing for "Typical" or "Top-tier". It’s the most precise test, but it requires the data to be Normally distributed.


5. Choosing the Right Test

One of the biggest challenges in the exam is deciding which test to use. Don't worry if this seems tricky; just follow this logical path:

Step 1: Is it Paired?

Are there two measurements for one person? Or matched pairs (like twins)? If yes, it's a paired test.

Step 2: Check the Distribution of Differences
  • Is it Normal? Use the Paired t-test.
  • Is it NOT Normal, but Symmetrical? Use the Wilcoxon Signed-Rank Test.
  • Is it neither, or do you have very little info? Use the Sign Test.

Quick Review Box:
Sign Test: No assumptions. Uses signs (+/-) only. Low power.
Wilcoxon: Assumption of symmetry. Uses ranks. Medium-high power.
Paired t-test: Assumption of normality. Uses actual values. Highest power.


6. Interpreting the Results in Context

Once you get your \(p\)-value or compare your test statistic to a critical value, you must write a conclusion. Pearson Edexcel examiners love context!

Example of a good conclusion:
"There is sufficient evidence at the 5% significance level to suggest that the mean heart rate of athletes decreased after the meditation session (\(p = 0.032 < 0.05\)). We reject the null hypothesis."

Example of a poor conclusion:
"Reject H0. The result is significant." (This will lose you marks for lacking context!)

Key Points to Remember:
- Always state the significance level you used.
- Always refer back to the real-world objects in the question (e.g., "the drug," "the students," "the reaction times").
- Never say the result is "certain"—use phrases like "suggests that" or "evidence to show."


Summary Checklist

Before you move on, make sure you can:
1. Calculate a set of differences from two columns of data.
2. Identify when data is "paired" vs. "independent."
3. State the specific assumptions for the Wilcoxon (symmetry) and Paired t-test (normality).
4. Rank data correctly, handling ties by averaging.
5. Use a Binomial table for the Sign Test.