Informal hypothesis testing for correlation/ association - Mathematics B (MEI) - H640 - Cambridge OCR A Level

Welcome to Informal Hypothesis Testing for Correlation!

In this chapter, we are going to explore how we can tell if two things are related. For example, does the amount of time you spend revising actually relate to your exam score? Or does the height of a person relate to their shoe size? We use Informal Hypothesis Testing to see if a pattern we see in a small group (a sample) is strong enough to suggest that the same pattern exists for everyone (the population).

Don't worry if this seems a bit "wordy" at first—at its heart, we are just looking for evidence to see if a relationship is real or just a coincidence!

The Basics: Bivariate Data and Relationships

Before we test anything, we need to understand what we are looking at. In this section, we deal with bivariate data. This simply means we have two measurements for every individual (like "Height" and "Weight").

1. Correlation vs. Association

These two terms are often used interchangeably, but there is a slight difference:

Correlation: This specifically refers to a linear relationship. In other words, if you plotted the data on a scatter diagram, would the points look like they are trying to form a straight line?
Association: This is a broader term. It means there is some kind of relationship between the variables, even if it isn't a straight line (it could be a curve, for example).

2. The Correlation Coefficient \(r\)

We use a number called the correlation coefficient, denoted by the letter \(r\), to measure the strength of a linear relationship.

If \(r = 1\), it is a perfect positive linear correlation (a perfect upward straight line).
If \(r = -1\), it is a perfect negative linear correlation (a perfect downward straight line).
If \(r = 0\), there is no linear correlation at all.

Quick Review: The closer \(r\) is to \(1\) or \(-1\), the stronger the relationship. The closer it is to \(0\), the weaker it is.

Rank Correlation: When Lines Aren't Straight

Sometimes, data doesn't form a straight line, but it still moves in a consistent direction (e.g., as \(x\) goes up, \(y\) always goes up, but at different speeds). In these cases, we use rank correlation.

Instead of using the actual values (like \(152cm\), \(180cm\)), we rank them (1st tallest, 2nd tallest, etc.). This measures the association between the ranks rather than the actual values. It's a great tool when you have outliers or a non-linear relationship!

Did you know? You don't need to know the complex names of different coefficients (like Pearson's or Spearman's) for this specific module—you just need to know how to use the value of \(r\) that the question gives you!

The Hypothesis Testing Process

This is where we decide if our sample's correlation is "statistically significant." We follow a specific set of steps.

Step 1: Set up your Hypotheses

We always start with two statements:

Null Hypothesis \(H_0\): The "boring" assumption. It always states that there is no correlation/association in the population.
Alternative Hypothesis \(H_1\): What we suspect is actually happening.
- 1-tailed test: We predict the direction (e.g., "There is a positive correlation").
- 2-tailed test: We just think there is a relationship but aren't sure of the direction (e.g., "There is a correlation").

Step 2: Look at the \(p\)-value or Critical Value

In your exam, you will usually be given a \(p\)-value or a critical value for the correlation coefficient. This comes from statistical software or tables.

\(p\)-value: This is the probability that the correlation we saw happened by pure luck.
Significance Level: This is the "threshold" set by the researcher (usually \(5\%\) or \(0.05\)).

Step 3: Make a Decision

Compare your \(p\)-value to the significance level. Here is a simple rhyme to remember the rule:

"If the p is low, the null must go!"

If \(p < \text{significance level}\): We reject \(H_0\). There is enough evidence to suggest a correlation exists.
If \(p > \text{significance level}\): We fail to reject \(H_0\). There isn't enough evidence to say the correlation is real.

Key Takeaway: A small \(p\)-value means the result is very unlikely to be a fluke!

Drawing Conclusions (The "Non-Assertive" Way)

In Mathematics B (MEI), examiners look for non-assertive language. We never say we have "proven" something. Instead, we say "there is enough evidence to suggest..."

Example: "There is sufficient evidence at the \(5\%\) significance level to suggest that there is a positive correlation between revision time and exam marks."

Real-World Example: Ice Cream and Sunburn

Imagine you find a high correlation coefficient (\(r = 0.9\)) between ice cream sales and cases of sunburn. A hypothesis test would likely show this is a "significant" correlation.

Does this mean ice cream causes sunburn? No! This is a classic example of why Correlation does not imply Causation. Both are caused by a third factor: hot weather. Always keep this in mind when interpreting your results!

Common Mistakes to Avoid

Mistaking \(r\) for the \(p\)-value: \(r\) tells you how strong the line is; the \(p\)-value tells you if that strength is significant.
Assertive Language: Avoid saying "This proves that \(x\) causes \(y\)." Stick to "There is evidence to suggest..."
Outliers: Be careful! A single outlier can make a weak correlation look strong or a strong correlation look weak. Always look at the scatter diagram if provided.
Time Series: Correlation coefficients should only be used for random variables. They aren't suitable for things like time series where one variable (time) is set at fixed intervals.

Quick Review Box

\(H_0\): No correlation.
\(H_1\): There is a correlation (positive/negative/any).
Decision: If \(p \leq \text{significance level}\), reject \(H_0\).
Context: Always write your final answer in terms of the original variables (e.g., "height" and "weight").

Summary Takeaway

Informal hypothesis testing for correlation allows us to use a sample's correlation coefficient (\(r\)) and a \(p\)-value to decide if a relationship exists in the wider population. By following the "If the p is low, the null must go" rule and using careful, non-assertive language, you can master this section of the Statistics curriculum!

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.