### 1. The Core Ingredients: \(O\) and \(E\) Before we run any test, we need two sets of numbers for every category:
- Observed Frequencies (\(O\)): These are the actual results you collected from an experiment or survey.
- Expected Frequencies (\(E\)): These are the results you *should* have gotten if your theory (your Null Hypothesis) is true.
The Test Statistic Formula
To turn these differences into a single number we can use, we use this formula:\(\chi^2_{calc} = \sum \frac{(O - E)^2}{E}\)
How to read this: 1. For every category, subtract the Expected from the Observed. 2. Square that number (so it's always positive). 3. Divide by the Expected number. 4. Add all those results together (that's what the \(\sum\) symbol means).Quick Review: If the Observed numbers are very close to the Expected numbers, \(\chi^2\) will be very small. If they are very different, \(\chi^2\) will be large!
### 2. Testing for Independence (Contingency Tables) Sometimes we want to know if two things are related. Does your favorite color depend on your gender? Does a medicine work better for certain age groups? We use Contingency Tables (data arranged in rows and columns) to find out.
Setting the Hypotheses
Every test starts with two statements:- \(H_0\) (Null Hypothesis): The two variables are independent (there is no connection).
- \(H_1\) (Alternative Hypothesis): The two variables are not independent (there is a connection).
Calculating the Expected (\(E\)) Values
For a contingency table, we calculate the expected value for each cell using the "Row-Column-Grand" rule:\(E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}\)
Degrees of Freedom (\(df\))
The "degrees of freedom" tells us how much of our data is free to vary. For a contingency table with \(r\) rows and \(c\) columns:\(df = (r - 1)(c - 1)\)
Example: In a \(3 \times 2\) table, \(df = (3-1)(2-1) = 2 \times 1 = 2\).Key Takeaway: In a test for independence, \(H_0\) always claims there is no relationship between the variables.
### 3. The Golden Rules: Constraints and Corrections The \(\chi^2\) test is an approximation, and it only works well if we have enough data. There are two "safety rules" you must remember for your exam:
The "Rule of 5"
Every single Expected Frequency (\(E\)) must be at least 5.What if it's not? If an \(E\) value is less than 5, the test becomes unreliable. To fix this, you must combine adjacent rows or columns (or classes) until every \(E \ge 5\). Note: Always combine the Observed (\(O\)) values for those rows/columns as well!
Yates’ Continuity Correction
This is a special adjustment used only for \(2 \times 2\) tables (where \(df = 1\)). It makes the test a bit more conservative.Modified formula: \(\chi^2 = \sum \frac{(|O - E| - 0.5)^2}{E}\)
The vertical bars \(|O - E|\) mean "the positive difference" (ignore any minus signs). Subtract 0.5 from that difference before squaring.Did you know? Yates' correction is named after Frank Yates, a British statistician. It’s like a "safety buffer" to make sure we don't accidentally claim a relationship exists when it might just be luck!
### 4. Goodness of Fit Tests A "Goodness of Fit" test checks if your data follows a specific theoretical distribution, like a given ratio, a proportion, or a discrete uniform distribution.
Types of Fitting
- Given Ratios: For example, testing if a plant's offspring follow a genetic ratio of \(3:1\). If you have 100 plants, you'd expect 75 of one type and 25 of the other.
- Discrete Uniform Distribution: This is when you expect every outcome to be equally likely. If you have \(n\) categories and \(N\) total observations, every \(E = \frac{N}{n}\).
Degrees of Freedom for Goodness of Fit
For these tests:\(df = \text{number of cells} - 1\)
Note: If you combined classes to satisfy the \(E \ge 5\) rule, the "number of cells" refers to the new number of categories after combining.Common Mistake to Avoid: Students often use the total number of observations to calculate \(df\). Remember, \(df\) is based on the number of categories (cells), not the number of people or items you counted!
### 5. Step-by-Step: How to Run the Test When you're in the exam, follow these steps to stay organized:
- State your Hypotheses: Write down \(H_0\) and \(H_1\) clearly.
- Calculate Expected Values (\(E\)): Use the total and the distribution/ratio given.
- Check the \(E \ge 5\) Rule: If any \(E < 5\), combine categories and recalculate \(df\).
- Calculate the \(\chi^2\) Statistic: Use the formula \(\sum \frac{(O-E)^2}{E}\) (and Yates' if it's a \(2 \times 2\) table).
- Find the Critical Value: Look up the value in the provided table using your \(df\) and the significance level (e.g., 5%).
- Compare and Conclude:
- If calculated \(\chi^2\) > critical value: Reject \(H_0\). There is evidence of a pattern/connection.
- If calculated \(\chi^2\) < critical value: Do not reject \(H_0\). There isn't enough evidence to suggest a pattern.
Encouragement: Step 4 can involve a lot of small calculations. Take your time and maybe use a table format to keep track of your \((O-E)^2 / E\) values. Accuracy here is key!
### Final Summary: The Big Picture
What you've learned:
- \(\chi^2\) measures the "gap" between what we see (\(O\)) and what we expect (\(E\)).
- Independence Tests use \((r-1)(c-1)\) degrees of freedom.
- Goodness of Fit tests check against specific patterns or ratios.
- Expected values must be \(\ge 5\); otherwise, combine classes.
- Yates' correction is your best friend, but only for \(2 \times 2\) tables!