Hypothesis test using Pearson’s correlation coefficient - Mathematics A - H240 - Cambridge OCR A Level

Welcome to Hypothesis Testing with Pearson’s Correlation!

Hi there! In this chapter, we are going to learn how to decide if a relationship between two things (like the time you spend revising and the marks you get) is actually "real" or just a result of random chance. We use a tool called Pearson’s Product-Moment Correlation Coefficient (PMCC) to help us make this decision. Don't worry if it sounds like a mouthful—we'll break it down step-by-step!

Section 1: What is the PMCC?

Before we test anything, we need to know what we are looking at. The Pearson’s Product-Moment Correlation Coefficient, usually written as \( r \) for a sample, is a number that tells us two things about the relationship between two variables:

How close the data points are to a straight line.
Whether the relationship is positive (both go up together) or negative (one goes up, the other goes down).

Quick Review of the Scale:
The value of \( r \) is always between -1 and 1.
- \( r = 1 \): Perfect positive linear correlation.
- \( r = -1 \): Perfect negative linear correlation.
- \( r = 0 \): No linear correlation at all.

Did you know?
The PMCC only measures linear (straight-line) relationships. If your data points form a perfect "U" shape, the PMCC might be 0, even though there is clearly a pattern!

Key Takeaway: The PMCC (\( r \)) tells us how strong and in what direction a straight-line relationship is.

Section 2: Setting up the Hypothesis Test

When we do a hypothesis test, we are trying to see if the correlation we found in our small sample is strong enough to suggest there is a correlation in the whole population.

1. The Population Parameter (\( \rho \))

In statistics, we use the Greek letter \( \rho \) (pronounced "rho") to represent the correlation coefficient of the entire population. This is what we are actually testing.

2. Stating the Hypotheses

Every test starts with two statements:
- The Null Hypothesis (\( H_0 \)): This is the "boring" version. We assume there is no correlation in the population. It is always \( H_0: \rho = 0 \).
- The Alternative Hypothesis (\( H_1 \)): This is what we suspect might be true. It depends on whether we are looking for any correlation, just positive, or just negative.

One-Tailed vs. Two-Tailed Tests:
- Two-Tailed: You just want to know if there is a correlation. (\( H_1: \rho \neq 0 \))
- One-Tailed (Positive): You think one variable increasing makes the other increase. (\( H_1: \rho > 0 \))
- One-Tailed (Negative): You think one variable increasing makes the other decrease. (\( H_1: \rho < 0 \))

Memory Aid:
Think of \( \rho \) as a "Road."
\( H_0 \) says the road is flat (zero slope/no connection).
\( H_1 \) says the road is going somewhere (up, down, or just not flat)!

Key Takeaway: Always start by defining \( \rho \) and writing your \( H_0 \) and \( H_1 \).

Section 3: The Rules of the Game (Assumptions)

For this test to be valid for the OCR A Level syllabus, we make one major assumption about the data: it must come from a bivariate normal distribution.

What does that mean?
In simple terms, if you looked at a scatter diagram of the population, the points would form a sort of "elliptical" or egg-shaped cloud. You don't need to prove this in the exam, but you must state it as an assumption if asked.

Encouraging Note: Don't worry if "bivariate normal distribution" sounds scary. In your exam, you can usually just assume it’s true to proceed with the test!

Section 4: How to Conduct the Test (Step-by-Step)

The syllabus says you do not need to calculate \( r \) from scratch (your calculator or the exam paper will give it to you). Your job is to interpret it!

Step 1: State your Hypotheses

Write down \( H_0: \rho = 0 \) and your chosen \( H_1 \).

Step 2: Pick your Significance Level (\( \alpha \))

Usually 5% (0.05) or 1% (0.01). This is the "hurdle" the data has to jump over to be considered "significant."

Step 3: Find the Critical Value

You will be given a Table of Critical Values. To use it, you need:
1. The sample size (\( n \)).
2. Whether the test is one-tailed or two-tailed.
3. The significance level.
The table will give you a "borderline" number.

Step 4: Compare your \( r \) to the Critical Value

If your sample correlation \( r \) is further away from zero than the critical value, it’s a big deal! We reject \( H_0 \).
Example: If the critical value is 0.5 and your \( r \) is 0.7, you have enough evidence!

Step 5: Write your Conclusion

Always write this in two parts:
1. A statistical comment: "Reject \( H_0 \)" or "Fail to reject \( H_0 \)".
2. A real-world comment: "There is evidence to suggest that there is a positive correlation between revision time and test scores."

Quick Review Box:
- \( |r| > \text{Critical Value} \implies \) Reject \( H_0 \) (Significant result).
- \( |r| < \text{Critical Value} \implies \) Accept \( H_0 \) (Not enough evidence).

Section 5: Using p-values

Sometimes, instead of a table, you might be given a p-value. This is even easier!
The p-value is the probability that the correlation we saw happened by pure luck.

If p-value < Significance Level: The result is significant. Reject \( H_0 \).
If p-value > Significance Level: The result is not significant. Fail to reject \( H_0 \).

Common Mistake to Avoid:
Students often forget that for a two-tailed test, if you are looking at a table, you must make sure you are using the "two-tailed" column at the correct significance level!

Section 6: Correlation vs. Causation

This is a favorite exam question! Just because a hypothesis test shows a significant correlation, it does not mean that one thing causes the other.

Example: Ice cream sales and shark attacks are highly correlated (because they both happen more in summer). But eating ice cream doesn't cause shark attacks!

Key Takeaway: Correlation shows a mathematical link, not necessarily a cause-and-effect relationship.

Summary Checklist

Before you sit your exam, make sure you can:
- State hypotheses using \( \rho \).
- Explain that \( r \) measures linear correlation.
- Use a table to find critical values based on \( n \) and significance levels.
- Compare p-values to significance levels.
- State the assumption of a bivariate normal distribution.
- Write a conclusion that relates back to the context of the question.

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.