Introduction to Chi-Squared (\(\chi^2\)) Tests
Welcome to one of the most practical chapters in Further Statistics! Have you ever wondered if a coin is truly fair, or if a company’s claim about the "random" colors in a bag of sweets is actually true? Chi-squared tests are the tools mathematicians use to answer these questions.
In this chapter, we are essentially comparing what we see (Observed values) with what we expected to see (Expected values) based on a mathematical model. If the difference is huge, we might conclude our model is wrong!
1. The Goodness of Fit Test
A "Goodness of Fit" test checks how well a specific probability distribution (like the Discrete Uniform, Binomial, or Poisson) fits a set of real-world data.
The Null and Alternative Hypotheses
Every test starts with two statements:
\(H_0\) (The Null Hypothesis): The data fits the specified distribution (e.g., "The data follows a Poisson distribution").
\(H_1\) (The Alternative Hypothesis): The data does not fit the specified distribution.
The Test Statistic
To measure the "gap" between our observations and our expectations, we use the Chi-squared statistic:
\(\chi^2_{calc} = \sum \frac{(O_i - E_i)^2}{E_i}\)
Where:
- \(O_i\) = The Observed frequency (what actually happened).
- \(E_i\) = The Expected frequency (what the math says should happen).
Analogy: Imagine you expect to get 10 texts a day. If you get 9, it's no big deal. If you get 200, something has changed! The Chi-squared statistic calculates exactly how "weird" that difference is.
The Golden Rule: The "Rule of 5"
For the Chi-squared test to be accurate, your Expected frequency (\(E_i\)) must be at least 5 for every cell.
Common Mistake: Students often look at the Observed (\(O_i\)) values. Don't do that! Always check the Expected (\(E_i\)) values. If a value is less than 5, you must combine that cell with an adjacent one (and remember to adjust your degrees of freedom later!).
Quick Review Box:
1. State \(H_0\) and \(H_1\).
2. Calculate Expected frequencies (\(E_i\)).
3. Check if any \(E_i < 5\). If so, merge cells.
4. Calculate the \(\chi^2\) statistic.
2. Degrees of Freedom (\(\nu\))
The "Degrees of Freedom" (represented by the Greek letter nu, \(\nu\)) tells us which Chi-squared curve to use. It's the most common place to lose marks, so pay close attention!
For Goodness of Fit Tests:
\(\nu = n - 1 - k\)
Where:
- \(n\) = The number of cells (after any merging).
- \(1\) = Always subtracted because the total frequency is fixed.
- \(k\) = The number of parameters estimated from the data.
When is \(k\) used?
- Discrete Uniform: Usually \(k=0\) (no parameters to estimate).
- Poisson: If you are given \(\lambda\), \(k=0\). If you have to calculate the mean (\(\bar{x}\)) from the data first, \(k=1\).
- Binomial: If you are given \(p\), \(k=0\). If you have to calculate \(p\) using the mean of the data, \(k=1\).
Did you know? The term "Degrees of Freedom" refers to how many values in a system are free to vary. If you know the total frequency and all cell values except one, that last value is "locked in," which is why we subtract 1!
3. Contingency Tables
Contingency tables are used to test if two different factors are independent. For example: "Is there a link between a student's favorite subject and their gender?"
Hypotheses for Contingency Tables:
\(H_0\): The two factors are independent (no association).
\(H_1\): The two factors are not independent (there is an association).
Calculating Expected Frequencies:
For each cell in the table, use this simple formula:
\(E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\)
Degrees of Freedom for Tables:
\(\nu = (r - 1)(c - 1)\)
Where \(r\) is the number of rows and \(c\) is the number of columns.
Tip: You don't need to worry about "parameters estimated" (\(k\)) for contingency tables; just use the row/column formula!
4. Finding the Critical Value and Concluding
Once you have your calculated \(\chi^2\) and your degrees of freedom \(\nu\), you need to decide if the result is significant.
Step 1: Look up the Critical Value in the statistical tables provided in your exam using your significance level (usually 5% or 1%) and your \(\nu\).
Step 2: Compare!
- If your Calculated \(\chi^2\) > Critical Value: The difference is too big to be a coincidence. Reject \(H_0\).
- If your Calculated \(\chi^2\) < Critical Value: The difference is small enough to be random chance. Fail to reject \(H_0\) (Accept \(H_0\)).
Using a Calculator: Your calculator can often give you a p-value.
- If p-value < significance level (e.g., 0.03 < 0.05), Reject \(H_0\).
- Don't worry if this seems tricky at first; just remember: "If the p is low, the \(H_0\) must go!"
Summary Checklist
Key Takeaways:
- Observed (\(O\)): Real data. Expected (\(E\)): Theoretical data.
- The Rule of 5: Always merge cells if \(E < 5\).
- Goodness of Fit \(\nu\): \(n - 1 - \text{parameters estimated}\).
- Contingency Table \(\nu\): \((r-1)(c-1)\).
- The Conclusion: Large \(\chi^2\) values lead to rejecting the null hypothesis.