Welcome to Non-Parametric Tests!
In your previous statistics studies, you might have used tests like the t-test, which assume your data follows a specific pattern (usually the Normal distribution). But what happens if your data is messy, skewed, or you simply don’t know its distribution? That’s where non-parametric tests come in! Think of them as the "flexible" version of hypothesis testing. They don’t care about the shape of your population distribution; they focus on the median and the ranks of the data instead.
Quick Review: Before we start, remember that a Hypothesis Test always starts with a Null Hypothesis (\(H_0\)) and an Alternative Hypothesis (\(H_1\)). In this chapter, we are usually testing the population median (\(M\)).
1. Why Go Non-Parametric?
Non-parametric tests are often called distribution-free tests. You should choose them when:
- Your sample size is very small.
- The data is skewed (not symmetrical).
- The data is ordinal (you can rank it, like "1st, 2nd, 3rd," but the gaps between them aren't necessarily equal).
- You cannot assume the population follows a Normal distribution.
The Big Difference: While parametric tests use the actual values (the mean), non-parametric tests use the rank of the values (their position when ordered from smallest to largest).
Key Takeaway: Non-parametric tests are robust rebels—they work even when the standard rules of "Normal distributions" are broken!
2. The Single-Sample Sign Test
This is the simplest test. It only cares about whether an observation is above (+) or below (-) a hypothesized median.
How it works (Step-by-Step):
- State \(H_0\) (e.g., \(M = 50\)) and \(H_1\) (e.g., \(M > 50\)).
- For every data point, record a + if it’s greater than the hypothesized median, and a - if it’s smaller.
- Ignore any values that are exactly equal to the hypothesized median.
- Let \(X\) be the number of plus signs (or the smaller of the plus/minus counts for a two-tailed test).
- Under \(H_0\), the number of plus signs follows a Binomial Distribution: \(X \sim B(n, 0.5)\), where \(n\) is the total number of pluses and minuses.
- Calculate the probability (p-value) using your calculator and compare it to the significance level.
Analogy: Imagine a seesaw balanced at the median. If the true median is 50, you’d expect a 50/50 split of people sitting on the "higher" side and the "lower" side. If way more people are on the "higher" side, the seesaw tips, and we reject the idea that the balance point is 50!
3. Single-Sample Wilcoxon Signed-Rank Test
The Sign Test is easy, but it throws away information (it doesn't care how much bigger a number is). The Wilcoxon Signed-Rank Test is more powerful because it looks at the magnitude of the differences.
The Process:
- Calculate the difference between each observation and the hypothesized median: \(d_i = x_i - M_0\).
- Rank these differences from smallest to largest, ignoring the plus/minus signs for now. (The smallest absolute difference gets Rank 1).
- Label each rank with the sign of its original difference (e.g., if the difference was -2 and it got Rank 3, it becomes a negative rank).
- Calculate the sum of positive ranks (\(W_+\)) and the sum of negative ranks (\(W_-\)).
- Your test statistic \(T\) is usually the smaller of \(W_+\) and \(W_-\).
- Compare \(T\) to the critical value in the Wilcoxon Single-Sample table.
Common Mistake: Don't forget to ignore differences of zero! If an observation equals the hypothesized median, discard it and reduce your \(n\) accordingly.
Key Takeaway: Use the Sign Test for very messy data, but use Wilcoxon Signed-Rank if you want a more "sensitive" test that considers the size of the gaps.
4. Comparing Two Samples: Paired vs. Unpaired
Before testing, you must decide if your data sets are "linked" or independent.
Paired-Sample (Matched Pairs)
Example: Measuring the pulse of 10 students before and after exercise. The "before" and "after" numbers belong to the same person. We test the differences using a Sign Test or Wilcoxon Signed-Rank test (just like the single-sample tests above, but testing if the median difference is 0).
Two-Sample (Unpaired)
Example: Comparing the heights of 10 boys and 12 girls. These are two independent groups. We use the Wilcoxon Rank-Sum Test (also known as the Mann-Whitney U Test).
5. Wilcoxon Rank-Sum Test (Mann-Whitney U)
This is used to see if two independent populations are identical.
Step-by-Step:
- Combine both samples into one big list of size \(N = m + n\).
- Rank all values from 1 to \(N\).
- Find the sum of the ranks for the smaller group (let's call this sum \(R_m\)).
- The test statistic \(W\) is simply the sum of the ranks for that group.
- Compare \(W\) to the Wilcoxon Rank-Sum tables.
Did you know? The tables usually give critical values for the smaller sample size \(m\). If your groups are different sizes, always use the one designated by the table instructions!
6. Normal Approximations for Large Samples
When the sample size \(n\) gets large (usually \(n > 20\)), the Wilcoxon tables run out of space. Luckily, the test statistics start to follow a Normal Distribution!
For Wilcoxon Signed-Rank Test (\(T\)):
Mean \( \mu = \frac{1}{4}n(n+1) \)
Variance \( \sigma^2 = \frac{1}{24}n(n+1)(2n+1) \)
\( T \sim N(\mu, \sigma^2) \)
For Wilcoxon Rank-Sum Test (\(W\)):
Mean \( \mu = \frac{1}{2}m(m+n+1) \)
Variance \( \sigma^2 = \frac{1}{12}mn(m+n+1) \)
\( W \sim N(\mu, \sigma^2) \)
Don't worry if this seems tricky! These formulas are provided in your formula booklet. Just remember to use a continuity correction of 0.5 when calculating your z-score because you are moving from a discrete rank to a continuous Normal curve.
Quick Review Box
Which test should I use?
- One Sample (Median): Sign Test or Wilcoxon Signed-Rank.
- Paired Data (Before/After): Use the differences in a Sign Test or Wilcoxon Signed-Rank.
- Two Independent Groups: Wilcoxon Rank-Sum (Mann-Whitney U).
- Large Samples: Use the Normal Approximation formulas.
Memory Aid: Signed-Rank is for Same people (paired). Rank-Sum is for Separate groups.
Summary Checklist
1. Ranks: Are you ranking correctly? Smallest absolute difference = Rank 1.
2. Hypotheses: Are your hypotheses about the Median (\(M\)) and not the mean?
3. Ties: Remember, the syllabus for this course excludes problems with tied ranks or observations coinciding with the median, making your life a bit easier!
4. Conclusions: Always write your final conclusion in context. "There is significant evidence at the 5% level to suggest the median score has increased."