Welcome to Correlation!

Ever wondered if there is a genuine link between the amount of time you spend practicing "Further Maths" and the marks you get in your exams? Or if taller people really do have bigger feet? In this chapter, we are going to learn how to measure these relationships using numbers. We call this Correlation.

Think of correlation as a "relationship detective." It helps us decide if two things are moving together, and more importantly, how strong that connection actually is. Don't worry if the formulas look a bit scary at first; we will break them down step-by-step!

1. The Product Moment Correlation Coefficient (PMCC)

The Product Moment Correlation Coefficient, usually called \(r\) for a sample and \(\rho\) (the Greek letter 'rho') for a population, is a measure of the strength of a linear relationship between two variables.

What do the numbers mean?

The value of \(r\) will always be between -1 and 1.

  • \(r = 1\): Perfect positive linear correlation (a straight line going up).
  • \(r = -1\): Perfect negative linear correlation (a straight line going down).
  • \(r = 0\): No linear correlation at all.

When should you use PMCC?

You use PMCC when you think your data follows a straight-line pattern. If the data looks like a curve, PMCC might not be the best tool!

The Effect of Coding

Here is a great "cheat code" for your exams: Linear coding does not change the PMCC.
If you add, subtract, multiply, or divide all your \(x\) values or \(y\) values by a constant, the value of \(r\) stays exactly the same. It’s "invariant."

Example: If the correlation between height in cm and weight in kg is 0.8, the correlation between height in meters and weight in kg will still be 0.8.

Quick Review:

PMCC (\(r\)) measures linear strength. It is not affected by coding. It is always between -1 and 1.


2. Spearman's Rank Correlation Coefficient

Sometimes, data isn't a perfect straight line, or the data is just "ranks" (like a list of your favorite movies from 1 to 10). This is where Spearman’s Rank (\(r_s\)) comes in.

Why use Spearman's instead of PMCC?

  • When the relationship is monotonic (it goes up or down, but not necessarily in a straight line).
  • When the data is already in ranks.
  • When the data has outliers that would mess up the PMCC.

How to calculate \(r_s\)

You will often be given a table of data. Follow these steps:

  1. Rank the first variable (\(x\)) from smallest to largest.
  2. Rank the second variable (\(y\)) from smallest to largest.
  3. Find the difference (\(d\)) between the ranks for each pair.
  4. Square those differences (\(d^2\)).
  5. Sum them up to get \(\sum d^2\).
  6. Use the formula: \(r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)}\)

Dealing with Ties

Don't worry if two values are the same! If two items are tied for 2nd and 3rd place, you give them both the average rank of \(2.5\). (Calculated as \(\frac{2+3}{2}\)).

Did you know? Your calculator can often calculate both \(r\) and \(r_s\) for you in the "6: Statistics" mode! Just enter your ranks as your data and find the value of \(r\).

Key Takeaway:

Use Spearman's when you are dealing with ranks or curves. The formula is on your side—just be careful when calculating those differences!


3. Hypothesis Testing for Correlation

Just because a sample shows correlation doesn't mean the whole population has it. We use a hypothesis test to see if our result is statistically significant.

Setting up the Hypotheses

We usually test to see if the correlation is zero (meaning no relationship).

  • Null Hypothesis (\(H_0\)): \(\rho = 0\) (or \(\rho_s = 0\)) — There is no correlation in the population.
  • Alternative Hypothesis (\(H_1\)): \(\rho \neq 0\) (Two-tailed) or \(\rho > 0\) / \(\rho < 0\) (One-tailed).

The Critical Value

You don't need to do huge calculations here. You will use the Statistical Tables provided in your exam. Look up your sample size (\(n\)) and your significance level (e.g., 5%) to find the critical value.

The Rule: If your calculated value is greater than the critical value, it's a big deal! You reject \(H_0\) and say there is evidence of correlation.

Important Condition for PMCC Testing

To perform a hypothesis test for the PMCC (\(r\)), the data must come from a bivariate normal distribution.
What does that mean? Imagine the scatter graph looks like a "cloud" that is densest in the middle and thins out at the edges. You don't need to prove this in the exam, but you must mention it if asked about assumptions!

Common Mistakes to Avoid:
  • Forgetting to use \(\rho\) or \(\rho_s\) in your hypotheses (don't use the sample letter \(r\)).
  • Mixing up one-tailed and two-tailed tests. Read the question carefully: does it say "is there correlation" (two-tailed) or "is there positive correlation" (one-tailed)?

Summary Checklist

Before you move on, make sure you can:

  • Identify whether to use PMCC (linear) or Spearman's (rank/monotonic).
  • State that coding has no effect on PMCC.
  • Calculate Spearman's Rank Correlation Coefficient using the formula or your calculator.
  • Handle tied ranks correctly by averaging them.
  • Carry out a Hypothesis Test using critical value tables.
  • Recall that PMCC tests require bivariate normality.

You've got this! Correlation is just about seeing how the world moves together. Practice a few ranking questions and you'll be a pro in no time.