Introduction: Making Connections

Welcome to the study of Correlation! In Statistics, we often want to know if two things are related. Does spending more time revising lead to higher marks? Does the height of a person relate to their shoe size? Correlation gives us a way to measure the strength and direction of these relationships using numbers.

In this chapter, you’ll learn how to calculate these numbers, test whether a relationship is "real" or just down to luck, and decide which method is best for different types of data. Don't worry if some of the formulas look big at first—we'll break them down step-by-step!


1. Pearson’s Product-Moment Correlation Coefficient (PMCC)

The PMCC (represented by the letter \(r\)) is a measure of the linear relationship between two variables. Think of it as a "line-o-meter": it tells us how closely the dots on a scatter diagram cluster around a single straight line.

Key Characteristics of \(r\):

  • The value of \(r\) always lies between -1 and +1.
  • \(r = +1\): Perfect positive linear correlation (a perfect straight line going up).
  • \(r = -1\): Perfect negative linear correlation (a perfect straight line going down).
  • \(r = 0\): No linear correlation at all.

Did you know? The PMCC only measures straight-line patterns. If your data points form a perfect "U" shape, the PMCC might be 0, even though there is clearly a relationship!

The Magic of Linear Coding

One very helpful property of the PMCC is that it is unaffected by linear coding. This means if you add, subtract, multiply, or divide all your \(x\) or \(y\) values by a constant, the value of \(r\) stays exactly the same.
Example: If you measure heights in cm and then convert them all to meters, your correlation coefficient \(r\) won't change at all!

Calculating \(r\)

In your exam, you are expected to use your calculator’s statistical functions to find \(r\) from raw data.
Quick Tip: Always double-check your data entry! One mistyped number can change your final \(r\) value significantly.

Key Takeaway: The PMCC measures how close data points are to a straight line. It is a number between -1 and 1 and is not changed by shifting or scaling the data.


2. Hypothesis Testing with PMCC

Just because we find a correlation in a small sample doesn't mean it exists in the whole population. We use a hypothesis test to see if our result is statistically significant.

The Assumption: Bivariate Normal Distribution

For a PMCC hypothesis test to be valid, we assume the data comes from a bivariate normal distribution. This is a fancy way of saying that both variables follow a normal distribution, and their joint distribution looks like a "bell-shaped mound" when plotted in 3D.

The Hypotheses:

  • Null Hypothesis (\(H_0\)): \(\rho = 0\) (There is no correlation in the population).
  • Alternative Hypothesis (\(H_1\)):
    • \(\rho \neq 0\) (Two-tailed test: there is some correlation).
    • \(\rho > 0\) (One-tailed test: there is positive correlation).
    • \(\rho < 0\) (One-tailed test: there is negative correlation).

Note: We use the Greek letter \(\rho\) (rho) to represent the correlation in the population, while \(r\) is for our sample.

How to Test:

  1. State \(H_0\) and \(H_1\) clearly.
  2. Identify the significance level (e.g., 5%) and the sample size (\(n\)).
  3. Find the critical value from your provided statistical tables.
  4. Compare your calculated \(r\) to the critical value:
    • If \(|r| > \text{critical value}\), reject \(H_0\). There is evidence of a correlation!
    • Otherwise, do not reject \(H_0\).

Key Takeaway: We test the sample \(r\) against a critical value to see if the population \(\rho\) is likely to be non-zero. Always mention "bivariate normal distribution" as your underlying assumption!


3. Spearman’s Rank Correlation Coefficient

Sometimes, data isn't linear, or it's given as a "rank" (like 1st, 2nd, 3rd place). This is where Spearman’s Rank Correlation Coefficient (\(r_s\)) shines.

When to use Spearman's:

  • When the relationship is monotone (it goes up or down but not necessarily in a straight line).
  • When the data is already in ranks or is qualitative (e.g., judging a talent show).
  • When you have outliers that might "pull" the PMCC away from the truth.

The Calculation (For up to 10 pairs):

The formula is: \(r_s = 1 - \frac{6 \sum d^2}{n(n^2 - 1)}\)

Step-by-Step Process:

  1. Rank the data for both variables (1 for the smallest, 2 for next, etc.).
  2. Calculate the difference (\(d\)) between the ranks for each pair.
  3. Square each difference (\(d^2\)).
  4. Sum the squares (\(\sum d^2\)).
  5. Plug the sum and the number of pairs (\(n\)) into the formula.

Common Mistake to Avoid: Ensure you rank both sets of data in the same direction (e.g., both from smallest to largest). Also, for this syllabus, you will not have to deal with "tied ranks" (where two items have the same value).

Key Takeaway: Spearman’s uses ranks instead of raw values. It's great for non-linear but consistent relationships and is a non-parametric test (it makes no assumptions about the population distribution).


4. Hypothesis Testing with Spearman’s

Similar to PMCC, we can test if an association exists in the population using Spearman’s coefficient. Because it makes no assumptions about the population (like normality), it is called a non-parametric test.

The Hypotheses:

  • \(H_0\): There is no association between the two variables in the population.
  • \(H_1\): There is an association (or a specific positive/negative association).

You use specific Spearman’s Critical Value Tables for this. The process is the same: if your \(|r_s|\) is greater than the critical value, you reject the null hypothesis.

Quick Review Box:
- PMCC: Tests for linear correlation. Needs normal distribution.
- Spearman's: Tests for association. Works for non-linear or ranked data. No distribution assumptions needed.


5. Choosing the Right Coefficient

A common exam question asks you to justify which coefficient to use. Use this guide:

  • Choose PMCC if: The scatter diagram looks linear AND you can assume a bivariate normal distribution.
  • Choose Spearman’s if: The data is ranked, the relationship is non-linear (curved but still going one way), or if there are outliers that would distort the PMCC.

Analogy: Imagine measuring how much a spring stretches. A straight-line ruler (PMCC) is perfect. But if you're measuring how much a person likes a spicy sauce (ranked 1 to 10), a "rank" system (Spearman's) makes much more sense!

Key Takeaway: Always look at a scatter diagram first. If it's a straight line, PMCC is your best friend. If it's a curve or involves "order," go with Spearman's.


Summary Checklist

Before you finish this chapter, make sure you can:

  • Calculate PMCC using your calculator.
  • Explain why linear coding doesn't change the PMCC.
  • Perform a hypothesis test for PMCC (and remember the "bivariate normal" assumption!).
  • Rank data and calculate Spearman’s Rank Coefficient.
  • Perform a hypothesis test for Spearman’s.
  • Choose between PMCC and Spearman’s based on a scatter diagram or context.

Don't worry if this seems tricky at first—with a bit of practice using your calculator and the statistical tables, these marks will become some of your favorites to pick up!