Welcome to the World of Non-Parametric Tests!

In your previous statistics studies, you probably spent a lot of time talking about the Normal Distribution. But what happens if your data doesn't look like a neat, symmetrical bell curve? Or what if you have a very small sample and you just don't know what the underlying distribution looks like?

That is where Non-parametric tests come to the rescue! Often called "distribution-free" tests, these methods are the "Swiss Army Knives" of statistics. They don't care if your data is normal, skewed, or weirdly shaped—they work anyway. In this chapter, we will learn how to test hypotheses about the median of a population rather than the mean.

1. The Big Idea: Parametric vs. Non-Parametric

Parametric tests (like the z-test or t-test) assume your data follows a specific pattern (a "parameter"), usually the Normal distribution.
Non-parametric tests make no such assumptions. They are much more flexible!

Analogy: Imagine you are buying a tailored suit. A parametric suit is made for a specific body type—if you don't fit that shape, the suit looks terrible. A non-parametric suit is like a "one-size-fits-all" poncho. It might not be as precise as a tailored suit, but it fits everyone!

Key Takeaway:

We use non-parametric tests when we cannot assume the population is Normally distributed or when we are dealing with ordinal data (data that can be ranked but not measured exactly).

2. The Single-Sample Sign Test

The Sign Test is the simplest non-parametric test. It ignores the actual values of the data and only looks at whether a value is higher (+) or lower (-) than a hypothesized median.

When to use it:

When you want to test the median (\(m\)) of a single population.

Step-by-Step Process:

1. State Hypotheses:
\(H_0: m = m_0\) (The median is some specific value)
\(H_1: m \neq m_0\) (or \(>\) or \(<\))

2. Calculate the Signs: For each data point, subtract the hypothesized median.
- If the result is positive, mark it as +.
- If the result is negative, mark it as -.
- If the result is exactly zero, discard it and reduce your sample size \(n\).

3. Find the Test Statistic (\(X\)): This is the number of times the less frequent sign occurs. For example, if you have 8 "+" and 2 "-", then \(X = 2\).

4. Distribution: Under \(H_0\), the number of "+" signs follows a Binomial Distribution: \(X \sim B(n, 0.5)\).

5. Find the p-value: Calculate the probability of getting a result as extreme as your test statistic using the Binomial formula or tables.

Quick Review: Why 0.5? Because if the median is truly \(m_0\), there should be a 50% chance of a value being above it and a 50% chance of being below it!

3. Single-Sample Wilcoxon Signed-Rank Test

The Sign Test is easy, but it's a bit "wasteful" because it throws away the actual size of the differences. The Wilcoxon Signed-Rank Test is more powerful because it considers both the direction and the magnitude of the differences from the median.

Important Requirement: This test assumes the population distribution is symmetrical (even if it's not Normal).

How to do it:

1. Calculate the difference \(d_i = x_i - m_0\) for each value.
2. Rank the absolute differences \(|d_i|\) from smallest to largest (1 is smallest). Ignore zeros.
3. Assign the original signs (+ or -) back to the ranks.
4. Calculate:
- \(W_+\) = Sum of ranks with positive signs.
- \(W_-\) = Sum of ranks with negative signs.
5. Your test statistic \(T\) is the smaller of \(W_+\) and \(W_-\).

Don't worry if this seems tricky! Just remember: you are ranking how "far away" each point is from the median and then checking if the "far away" points are mostly on one side or evenly split.

Common Mistake:

If two differences are the same (a tie), give them the average of the ranks they would have taken. For example, if the 3rd and 4th differences are equal, both get rank 3.5.

4. Wilcoxon Rank-Sum Test (Two Samples)

This test is used to see if two independent samples come from populations with the same median. It’s the non-parametric version of the two-sample t-test.

Step-by-Step:

1. Combine both samples into one big list of size \(n = n_1 + n_2\).
2. Rank all the values from 1 to \(n\).
3. Find the sum of the ranks for the first sample, \(R_1\).
4. Use the Wilcoxon Rank-Sum tables to find critical values based on \(n_1\) and \(n_2\).

Did you know? This test is sometimes called the Mann-Whitney U Test. While the calculation for \(U\) is slightly different, the underlying logic of ranking is exactly the same!

5. Large Sample Approximations

When the sample size \(n\) gets large (usually \(n > 20\)), the distributions of these test statistics start to look very much like a Normal Distribution. This makes our lives easier because we can use \(z\)-scores!

For Wilcoxon Signed-Rank (Single Sample):

Mean \(E(W) = \frac{n(n+1)}{4}\)
Variance \(Var(W) = \frac{n(n+1)(2n+1)}{24}\)

For Wilcoxon Rank-Sum (Two Samples):

Mean \(E(R_1) = \frac{n_1(n_1 + n_2 + 1)}{2}\)
Variance \(Var(R_1) = \frac{n_1 n_2 (n_1 + n_2 + 1)}{12}\)

Step-by-step for Large Samples:
1. Calculate the mean and variance using the formulas above.
2. Calculate the \(z\)-score: \(z = \frac{R_1 - E(R_1)}{\sqrt{Var(R_1)}}\).
3. Compare to the standard Normal distribution critical values (e.g., 1.96 for 5% two-tailed).

Pro-Tip: Always remember to use the continuity correction of 0.5 when moving from a discrete rank sum to a continuous Normal distribution! (e.g., \(|R_1 - E(R_1)| - 0.5\)).

Chapter Summary & Key Takeaways

1. Sign Test: Uses only the direction (+/-) of data compared to the median. Uses Binomial \(B(n, 0.5)\).
2. Wilcoxon Signed-Rank (Single Sample): Uses ranks of differences. Requires a symmetrical distribution.
3. Wilcoxon Rank-Sum (Two Samples): Compares two independent groups by ranking combined data.
4. Large Samples: When \(n\) is large, we use the Normal Approximation with specific formulas for mean and variance.
5. Why do we rank? Ranking removes the influence of outliers (extreme values), making the test more "robust" than a t-test.

Final encouraging thought: Non-parametric tests might seem like they have a lot of steps, but once you master the art of ranking, you've conquered half the battle! Keep practicing those rank-sum calculations, and you'll do great.