Introduction to Non-Parametric Tests

Welcome to one of the most practical chapters in your Statistics A Level! So far, you have likely spent a lot of time working with the Normal Distribution. While the Normal distribution is great, it has strict "rules"—it assumes your data is perfectly symmetrical and bell-shaped.

But what happens when your data is messy, skewed, or contains wild outliers? That’s where non-parametric tests come in. Think of these as the "rugged mountain bikes" of statistics. They don't care if the road is bumpy or the shape of the data is weird; they still get the job done. In this chapter, we will focus on testing the median of a population rather than the mean.

Quick Review: Remember that the median is the middle value when data is put in order. It is much more robust than the mean because it isn't pulled away by extreme outliers.


1. When to Go "Non-Parametric"?

We use non-parametric tests when the assumptions for a standard test (like a z-test or t-test) are not met.

Key differences to remember:

  • Parametric Tests: Assume the data follows a specific distribution (usually Normal). They test the mean (\(\mu\)).
  • Non-Parametric Tests: Do not assume a specific distribution shape. They test the population median.

Did you know? Non-parametric tests are often called "distribution-free" tests because they don't rely on the data fitting a specific curve.


2. The Sign Test (One Sample and Paired)

The Sign Test is the simplest non-parametric test. It is so simple that it doesn't even look at the actual numbers—it only looks at whether a value is above or below a specific point.

Testing a Single Population Median

Imagine a shoe company claims the median size of their customers' feet is 8. You suspect it’s higher. To test this, you don't need to know exactly how much bigger the feet are; you just count how many people have feet larger than size 8 (+ signs) and how many have feet smaller than size 8 (- signs).

Testing Paired Data

For paired data (like "before and after" measurements on the same people), we look at the difference between the two values. If the "after" score is higher, we give it a (+). If it's lower, we give it a (-).

Step-by-Step Process:

  1. State your null hypothesis \(H_0\): Median = value.
  2. Subtract the hypothesized median from each data point.
  3. Record the sign of the result (+ or -). If a value equals the median (a difference of zero), discard it and reduce your sample size \(n\).
  4. Your test statistic \(X\) is the number of times the less frequent sign occurs.
  5. Under \(H_0\), the number of plus signs follows a Binomial Distribution: \(X \sim B(n, 0.5)\).

Validity Condition: The Sign Test has no assumptions regarding the distribution of the population. This makes it very "safe" to use, but it is less powerful than other tests because it throws away a lot of information (the actual sizes of the differences).

Key Takeaway: Use the Sign Test when you have no idea about the shape of the data and just want to see if there's a general trend "up" or "down."


3. Wilcoxon Signed-Rank Test

Don't worry if this name sounds intimidating! The Wilcoxon Signed-Rank Test is just a more "intelligent" version of the Sign Test. While the Sign Test only cares about the direction of a difference (+ or -), the Wilcoxon test also cares about the magnitude (how big that difference is).

How it works (The Ranking Trick)

Instead of just looking at signs, we rank the differences from smallest to largest, ignoring whether they are positive or negative at first.

Step-by-Step Process:

  1. Calculate the difference between each observation and the hypothesized median.
  2. Discard any differences that are zero.
  3. Rank the absolute differences (ignore the signs). The smallest difference gets rank 1, the next smallest rank 2, etc.
  4. If there are tied values, give them the average of the ranks they would have taken (e.g., if the 3rd and 4th values are tied, give them both 3.5).
  5. Put the original signs (+ or -) back onto the ranks.
  6. Calculate \(W_+\) (sum of positive ranks) and \(W_-\) (sum of negative ranks).
  7. The test statistic \(T\) is usually the smaller of these two sums. Compare this to critical values in your formula booklet.

Critical Assumption: For the Wilcoxon Signed-Rank test to be valid, we must assume the distribution of the population is symmetrical (or the distribution of differences is symmetrical for paired data).

Analogy: If the Sign Test is a light switch (On/Off), the Wilcoxon Signed-Rank test is a dimmer switch (it tells you how much light there is!).

Key Takeaway: The Wilcoxon Signed-Rank test is more powerful than the Sign Test, but it requires the data to be symmetrical.


4. Wilcoxon Rank-Sum Test (Two Independent Samples)

This test is used when you want to compare two different, independent groups (e.g., comparing the heights of students in School A vs. School B). It is the non-parametric alternative to the independent samples t-test. It is also known as the Mann-Whitney U test.

Step-by-Step Process:

  1. Combine both groups into one big list.
  2. Rank all the values from smallest to largest across the whole combined list.
  3. Sum the ranks for the smaller group (let's call this sum \(R_1\)).
  4. Use the formula for the test statistic \(W\) (provided in your booklet) to see if the two groups are significantly different.

Common Mistake to Avoid: When ranking combined data, students often forget to keep track of which data point belonged to which group. Use different colored pens or labels like "A" and "B" to stay organized!

Key Takeaway: Use the Rank-Sum test for two unrelated groups to see if one population generally has higher values than the other.


5. Choosing the Right Test: A Quick Guide

Selecting the right test is a major part of the exam. Use this checklist to help you decide:

Q1: Are the samples paired (related) or independent?

  • Independent: Use Wilcoxon Rank-Sum Test.
  • Paired or Single Sample: Go to Q2.

Q2: Is the distribution symmetrical?

  • Yes: Use Wilcoxon Signed-Rank Test (it's more powerful).
  • No (or we don't know): Use the Sign Test (it's the safest).

Quick Review Box:
- Sign Test: No assumptions, uses Binomial \(B(n, 0.5)\).
- Wilcoxon Signed-Rank: Needs symmetry, uses ranks of differences.
- Wilcoxon Rank-Sum: Compares two separate groups by ranking combined data.


Summary Checklist

To master this chapter, make sure you can:

  • Explain why a non-parametric test is better than a parametric one for skewed data.
  • Conduct a Sign Test using the Binomial distribution.
  • Rank data correctly, including how to handle ties.
  • State the assumptions for each test clearly (especially the symmetry for Wilcoxon).
  • Use your statistical tables to find critical values for \(W\) and \(T\).

Don't worry if the ranking feels slow at first! With a little practice, it becomes a very mechanical and reliable way to pick up marks in your Paper 2 exam.