Introduction: Welcome to Chi-Squared Tests!
Ever wondered if there’s a real connection between two things, or if a pattern you see is just down to pure luck? For example, does the type of music you listen to actually depend on your age, or is it just random? Chi-squared (\(\chi^2\)) tests are the mathematical tools that help us answer these questions with confidence!
In this chapter, we’ll learn how to use these tests to check for association between categories and to see if a specific probability model (like the Binomial or Poisson distributions you’ve met before) actually fits the data we’ve collected. Don’t worry if it sounds a bit heavy at first—we’ll break it down into simple, manageable steps.
1. The Basics: Contingency Tables
Before we can test anything, we need to organize our data. When we have categorical data (data that fits into groups like "Red/Blue" or "Pass/Fail"), we use a contingency table.
Example: Imagine we ask 100 students if they prefer tea or coffee. We also record whether they are in Year 12 or Year 13. A contingency table would show exactly how many Year 12s like tea, how many Year 13s like coffee, and so on.
Key Terms
- Observed Values (\(O\)): These are the actual numbers you see in the data table.
- Expected Values (\(E\)): These are the numbers we would expect to see if there was absolutely no connection between the categories.
2. The \(\chi^2\) Test for Association (Independence)
This test checks if two factors are independent (not related) or if there is an association between them.
Step 1: State the Hypotheses
We always start with a "Null Hypothesis" (\(H_0\)) which assumes the boring option—that nothing interesting is happening.
\(H_0\): There is no association between the factors (they are independent).
\(H_1\): There is an association between the factors.
Step 2: Calculate Expected Frequencies (\(E\))
For each cell in your table, calculate what the value should be if \(H_0\) is true using this formula:
\[E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\]
Step 3: Find the Test Statistic (\(\chi^2_{\text{calc}}\))
We want to see how much our Observed values differ from our Expected values. We use this formula:
\[\chi^2 = \sum \frac{(O - E)^2}{E}\]
Quick Review: Each \(\frac{(O - E)^2}{E}\) is called a contribution. You sum them all up to get your final test statistic. If the Observed and Expected are very similar, \(\chi^2\) will be small!
Step 4: Degrees of Freedom (\(df\))
To find the "Critical Value" from a table, you need the degrees of freedom. For a contingency table with \(r\) rows and \(c\) columns:
\[df = (r - 1)(c - 1)\]
Step 5: Make a Decision
Compare your calculated \(\chi^2\) value to the Critical Value from the formula booklet (using your \(df\) and significance level):
- If \(\chi^2_{\text{calc}} > \text{Critical Value}\): Reject \(H_0\). There is evidence of an association!
- If \(\chi^2_{\text{calc}} < \text{Critical Value}\): Do not reject \(H_0\). There isn't enough evidence to say they are related.
Key Takeaway: A large \(\chi^2\) value means the difference between "what we saw" and "what we expected" is too big to be just luck!
3. Chi-Squared Test for Goodness of Fit
This test is a "reality check." It asks: "Does this set of data actually follow a specific distribution (like Uniform, Binomial, or Poisson)?"
The Hypotheses
\(H_0\): The [Model Name] fits the data.
\(H_1\): The [Model Name] does not fit the data.
The "Small Expected Frequency" Rule
Important! Chi-squared tests are only accurate if the Expected Values (\(E\)) are large enough.
The Rule: If any \(E < 5\), you must combine that cell with an adjacent one. Remember to adjust your count of categories (\(n\)) afterwards!
Calculating Degrees of Freedom for Goodness of Fit
This is where students often slip up! The formula is:
\[df = n - 1 - k\]
- \(n\) = the number of categories (after combining cells).
- \(k\) = the number of parameters you had to calculate from the data to build the model.
Memory Aid for \(k\):
- If the model is Uniform: \(k = 0\) (usually).
- If you had to calculate the Mean for a Poisson model: \(k = 1\).
- If you had to calculate the Probability (\(p\)) for a Binomial model: \(k = 1\).
Key Takeaway: Goodness of fit tells us if our mathematical models represent the real world accurately.
4. Common Mistakes to Avoid
- Mistaking \(O\) and \(E\): Always use the Observed frequencies for \(O\) and the calculated probabilities/frequencies for \(E\).
- Forgetting to Combine: If an Expected value is less than 5, you must combine cells before calculating \(\chi^2\).
- Incorrect \(df\): Double-check if you estimated any parameters (like the mean). If you did, subtract them from your \(df\).
- Using Percentages: Chi-squared tests must be done using counts/frequencies, never percentages or probabilities directly in the \(\chi^2\) formula!
5. Helpful Tips for the Exam
Did you know? You don't have to show every single repetitive calculation. The examiners want to see that you know the method. Show one example of calculating an Expected value and one example of a contribution, then you can list the rest or give the total.
Using the p-value: Sometimes software (or your calculator) gives a p-value instead of a critical value.
Rule of thumb: If the p-value < Significance Level (e.g., 0.05), then Reject \(H_0\).
Think: "If the p is low, the \(H_0\) must go!"
Don't worry if this seems tricky! The process is very logical once you've practiced a few tables. Just remember: state your hypotheses, find the expected values, calculate the sum of contributions, and compare it to the "gatekeeper" (the critical value).
Summary Checklist
- Contingency Table: Use \(df = (r-1)(c-1)\).
- Goodness of Fit: Use \(df = n - 1 - k\).
- The "5" Rule: Combine cells if \(E < 5\).
- The Statistic: \(\chi^2 = \sum \frac{(O-E)^2}{E}\).
- Conclusion: Always write your final answer in the context of the question (e.g., "There is evidence to suggest that age and music choice are related").