Welcome to Non-parametric Tests!
In your H2 Mathematics journey, you have likely spent a lot of time working with the "Normal Distribution." It is a beautiful, bell-shaped curve, but real-world data isn't always that neat! Sometimes, data is skewed, or we have very small samples where we can't be sure about the underlying distribution.
That is where Non-parametric tests come in. Think of them as the "all-terrain vehicles" of statistics. While parametric tests (like the t-test) need a smooth "paved road" (the Normal distribution) to work properly, non-parametric tests can handle "bumpy terrain" (data of any shape). In this chapter, we will learn how to test hypotheses without assuming our data follows a specific distribution.
1. Understanding Non-parametric Tests
Non-parametric tests are often called "distribution-free" tests because they do not rely on the assumption that the data comes from a specific probability distribution (like the Normal distribution).
Advantages of Non-parametric Tests:
• Flexibility: They can be used on data that is not Normally distributed.
• Simplicity: They are often based on the ranks of the data or simple signs (+ or -) rather than complex parameters like mean and variance.
• Robustness: They are less affected by extreme values (outliers) because they focus on the order of data rather than the exact values.
Disadvantages of Non-parametric Tests:
• Lower Power: If the data actually is Normally distributed, non-parametric tests are less "powerful" than parametric tests. This means they are less likely to catch a false null hypothesis.
• Less Information: Because they use ranks or signs, they "throw away" some of the specific information contained in the exact numerical values.
Quick Review Box:
Use Parametric tests (like t-tests) if you are sure the data is Normal.
Use Non-parametric tests if the distribution is unknown or clearly not Normal.
2. The Sign Test
The Sign Test is the simplest non-parametric test. We use it to test hypotheses about a population median (denoted as \( m \)). It is called the Sign Test because we only care about whether a data point is above (+) or below (-) the hypothesized median.
Formulating Hypotheses
Null Hypothesis \( H_0 \): The population median is equal to a specific value \( m_0 \).
\( H_0: m = m_0 \)
Alternative Hypothesis \( H_1 \):
• \( H_1: m \neq m_0 \) (Two-tailed)
• \( H_1: m > m_0 \) (Upper-tailed)
• \( H_1: m < m_0 \) (Lower-tailed)
How the Test Works (Step-by-Step)
1. Compare each data value in your sample to the hypothesized median \( m_0 \).
2. If the value is greater than \( m_0 \), record a plus sign (+).
3. If the value is less than \( m_0 \), record a minus sign (-).
4. If the value is exactly equal to \( m_0 \), we ignore it and reduce our sample size \( n \) accordingly.
5. Let \( X \) be the number of plus signs. Under \( H_0 \), each value is equally likely to be above or below the median. Therefore, the test statistic \( X \) follows a Binomial Distribution:
\( X \sim B(n, 0.5) \)
Example: Imagine you think the median wait time at a cafe is 10 minutes. You observe 10 people. 8 people wait longer than 10 mins (+), and 2 wait less (-). Since 8 is much higher than the expected 5, you might reject \( H_0 \).
Don't worry if this seems tricky at first! Just remember: we are just counting how many people fall on one side of the fence. If the fence is really the median, it should be a 50/50 split.
3. Wilcoxon Matched-Pair Signed Rank Test
The Wilcoxon Matched-Pair Signed Rank Test is used for paired data (like "before and after" measurements on the same people). It is more powerful than the Sign Test because it looks at both the direction (+ or -) and the magnitude (the size) of the differences.
Assumptions
For this test, we assume that the distributions of the two populations are identical in shape, differing only by a shift in location. This implies the distribution of the differences is symmetric.
Formulating Hypotheses
\( H_0 \): There is no difference between the two treatments (The population median of the differences is zero).
\( H_1 \): There is a difference (The population median of the differences is not zero).
Step-by-Step Process
1. Calculate the difference \( d_i \) for each pair (e.g., After minus Before).
2. Ignore any pairs where the difference is zero.
3. Take the absolute values of the differences \( |d_i| \) and rank them from smallest to largest (1, 2, 3...).
Syllabus Note: We will exclude cases where ranks are tied for this course.
4. Re-assign the original sign (+ or -) to each rank.
5. Calculate \( T^+ \) (the sum of ranks with positive signs) and \( T^- \) (the sum of ranks with negative signs).
6. The test statistic \( T \) is usually the smaller of \( T^+ \) and \( T^- \). Compare this to the critical value from the Wilcoxon table.
Analogy: Imagine two athletes. The Sign Test only asks "Who won more races?" The Wilcoxon Test asks "Who won more races AND by how much distance did they win?" This makes it a much smarter judge!
Key Takeaway: The Wilcoxon test uses the rank of the differences, not just the direction. This allows it to capture more information than the simple Sign Test.
4. Summary of Choosing Your Test
Choosing the right test is half the battle. Here is a quick guide:
1. Testing a single population median?
Use the Sign Test.
2. Testing differences in paired samples?
Use the Wilcoxon Matched-Pair Signed Rank Test.
3. Data is Normal and sample size is small?
Use a t-test (Parametric - covered in section 3.3).
4. Data is NOT Normal?
Always use the Non-parametric options (Sign or Wilcoxon).
Did you know? Non-parametric tests are widely used in psychology and medicine where researchers often use "Likert scales" (like 1 to 5 stars). Since the "gap" between 1 and 2 stars might not be the same as the "gap" between 4 and 5 stars, these tests are perfect for analyzing such data!
Final Checklist for Success
• Always state your hypotheses \( H_0 \) and \( H_1 \) clearly using the word median.
• For the Sign Test, identify your \( n \) and your test statistic \( X \sim B(n, 0.5) \).
• For the Wilcoxon Test, remember to rank the absolute differences first before putting the signs back.
• Always conclude in the context of the question (e.g., "There is sufficient evidence at the 5% level to suggest the median weight has increased").
Common Mistake to Avoid: Don't forget to discard "zero differences" in both the Sign and Wilcoxon tests! If a data point equals the hypothesized median or a pair has no change, it doesn't help us decide which way the data is leaning, so we leave it out and reduce \( n \).