Welcome to Chi-Squared Tests!

In this chapter, we explore a powerful statistical tool: the Chi-Squared (\(\chi^2\)) Test. Have you ever wondered if a six-sided die is actually fair, or if the number of goals scored in football matches follows a specific pattern? That is exactly what these tests help us decide! We use them to see if the "observed" data we collect matches the "expected" data from a mathematical model.

Don't worry if this seems a bit abstract at first. By the end of these notes, you will be able to tell if a set of data "fits" a model or if two factors are independent of each other.

1. The Basics: Observed vs. Expected

Every Chi-Squared test revolves around comparing two things:

  1. Observed Frequencies (\(O_i\)): The actual numbers you counted or collected from an experiment.
  2. Expected Frequencies (\(E_i\)): The numbers you should have gotten if your theory (the Null Hypothesis) were perfectly true.

The Chi-Squared Statistic

We calculate a single value to measure the "gap" between what we saw and what we expected. The formula is:

\(\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\)

Think of it this way: If the observed and expected values are very close, \((O - E)\) will be small, and our \(\chi^2\) value will be small. If they are very different, the \(\chi^2\) value will be large, suggesting our theory might be wrong!

The Vital Rule: The "Rule of 5"

For the Chi-Squared test to be accurate, the Expected Frequency (\(E_i\)) in every single cell must be at least 5.
Common Mistake: Students often check the Observed frequencies. Don't do that! Always check the Expected values.
What if it's less than 5? You must combine adjacent cells until the combined expected frequency is 5 or more. When you combine cells, remember that your number of categories (\(n\)) decreases!

Quick Takeaway: Large \(\chi^2\) = Large difference between theory and reality. Always ensure \(E_i \ge 5\).

2. Goodness of Fit Tests

A Goodness of Fit test checks if your data fits a specific probability distribution. In the Edexcel FM1 syllabus, you need to know how to test for:

  • Discrete Uniform: Every outcome is equally likely (like a fair die).
  • Binomial distribution: Success/failure over a set number of trials.
  • Poisson distribution: Events happening at a constant rate (e.g., radioactive decay).
  • Geometric distribution: How many trials until the first success.

The Hypothesis Steps

1. State the Hypotheses:
\(H_0\): The data can be modeled by a [Name of Distribution].
\(H_1\): The data cannot be modeled by a [Name of Distribution].

2. Calculate the Expected Frequencies:
\(E_i = \text{Probability of that category} \times \text{Total frequency}\)

3. Check the Rule of 5: Combine cells if necessary.

4. Calculate \(\chi^2\): Using the formula or your calculator's list function.

Degrees of Freedom (\(v\))

This is the trickiest part of Goodness of Fit! The formula is:

\(v = n - 1 - k\)

Where:
- \(n\) is the number of cells (after combining).
- \(1\) is always subtracted because the total frequency is fixed.
- \(k\) is the number of parameters you had to estimate from the data to calculate the expected values.

When is \(k > 0\)?
  • Poisson: If you had to calculate the mean (\(\lambda\)) from the table, \(k = 1\). If the question gives you \(\lambda\), \(k = 0\).
  • Binomial: If you had to calculate the probability \(p\) from the table, \(k = 1\).
  • Uniform: Usually \(k = 0\) as there are no parameters to estimate.

Did you know? "Degrees of freedom" is like being told you can pick any 5 numbers that add up to 20. You can pick the first 4 freely, but the 5th one must be a specific number to make the total work. That's why we usually subtract 1!

Quick Review: \(v = \text{cells} - 1 - \text{estimated parameters}\). Check the question carefully to see if parameters were "given" or "calculated."

3. Contingency Tables (Tests for Independence)

Contingency tables are used when we have two different factors (e.g., Gender and Choice of Subject) and we want to see if they are independent (unrelated).

The Hypothesis Steps

\(H_0\): [Factor 1] and [Factor 2] are independent.
\(H_1\): [Factor 1] and [Factor 2] are not independent (there is an association).

Calculating Expected Frequencies

For each cell in the table, the expected frequency is:
\(E_i = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\)

Degrees of Freedom for Contingency Tables

This is much simpler than Goodness of Fit!
\(v = (r - 1) \times (c - 1)\)
Where \(r\) is the number of rows and \(c\) is the number of columns. You do not need to worry about \(k\) here!

Memory Aid: For contingency tables, think of "RC" (Row/Column). The degrees of freedom is just the product of the "reduced" rows and columns.

Quick Takeaway: Contingency tables test if two variables are linked. Use the "Row \(\times\) Column / Grand Total" rule for expected values.

4. Concluding the Test

Once you have your calculated \(\chi^2\) and your degrees of freedom (\(v\)), you have two ways to finish:

  1. Critical Value Method: Look up the critical value in the statistical tables using \(v\) and your significance level (\(\alpha\)). If your Calculated \(\chi^2\) > Critical Value, you Reject \(H_0\).
  2. P-value Method: Most modern calculators give you the p-value. If p-value < Significance Level, you Reject \(H_0\).

Writing the Conclusion

Always write your conclusion in two parts:

  1. Statistical result: "Reject \(H_0\)" or "Fail to reject \(H_0\)."
  2. Contextual result: "There is sufficient evidence at the 5% level to suggest that the die is biased" or "There is insufficient evidence to suggest an association between gender and subject choice."

Encouraging Phrase: Hypothesis testing conclusions can feel wordy, but if you follow this "Result + Context" template every time, you'll pick up full marks easily!

Summary Checklist

  • Is it a Goodness of Fit test or a Contingency Table?
  • Are my Hypotheses clear and do they include the context?
  • Are all Expected Frequencies \(\ge 5\)? (Combine if not!)
  • Did I calculate the Degrees of Freedom correctly? (Watch out for estimated parameters!)
  • Is my conclusion both Statistical and Contextual?

Good luck with your practice! The more \(\chi^2\) tests you run, the more intuitive the process becomes.