Introduction: Measuring Relationships

Welcome! In this chapter, we are going to explore Correlation. Put simply, correlation is a way to measure how strongly two things are related. For example, is there a link between the amount of time you spend revising and the marks you get? Or between the height of a person and their shoe size?

In Further Statistics 2, we go beyond just looking at a scatter graph. We use mathematical tools to put a number on these relationships, helping us decide if a pattern is a "real" thing or just a coincidence. Don't worry if it seems like a lot of formulas—we'll break them down step-by-step!


1. The Product Moment Correlation Coefficient (PMCC)

The Product Moment Correlation Coefficient (often just called \(r\)) measures the strength and direction of a linear relationship between two variables.

What does the value of \(r\) tell us?

  • \(r = 1\): Perfect positive linear correlation (a perfect straight line pointing up).
  • \(r = -1\): Perfect negative linear correlation (a perfect straight line pointing down).
  • \(r = 0\): No linear correlation at all.

Calculating \(r\) from Summary Statistics

In your exam, you will often be given "summary statistics" like \( \sum x, \sum y, \sum x^2, \sum y^2, \) and \( \sum xy \). You use these to find the building blocks:

\( S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n} \)

\( S_{yy} = \sum y^2 - \frac{(\sum y)^2}{n} \)

\( S_{xy} = \sum xy - \frac{(\sum x)(\sum y)}{n} \)

The final formula for the PMCC is:

\( r = \frac{S_{xy}}{\sqrt{S_{xx} S_{yy}}} \)

Conditions for Use

You should only use the PMCC when you believe the relationship is linear (a straight line). If the scatter graph looks like a curve, \(r\) might give you a misleadingly low number!

The Magic of Coding

Did you know? The PMCC is independent of coding. If you add 10 to every \(x\) value, or multiply every \(y\) value by 5, the value of \(r\) stays exactly the same. It only cares about the pattern, not the scale of the numbers.

Key Takeaway

The PMCC (\(r\)) is for straight-line relationships. It ranges from -1 to 1 and is not affected by adding or multiplying the data by constants.


2. Spearman’s Rank Correlation Coefficient

Sometimes, data isn't a perfect straight line, or it might be "qualitative" (like ranking top 10 movies). This is where Spearman’s Rank Correlation Coefficient (\(r_s\)) comes in.

When to use Spearman's?

  • When the relationship is monotonic (moving in one direction, but not necessarily a straight line).
  • When the data is already in ranks.
  • When there are outliers that would mess up the PMCC.

How to Calculate \(r_s\) Step-by-Step

1. Rank the data for both variables (usually 1 for the smallest, 2 for the next, etc.).

2. Calculate the difference (\(d\)) between the two ranks for each pair.

3. Square those differences (\(d^2\)).

4. Sum the squared differences (\(\sum d^2\)).

5. Use the formula:

\( r_s = 1 - \frac{6 \sum d^2}{n(n^2 - 1)} \)

Note: \(n\) is the number of pairs of data.

Dealing with "Ties"

If two items are equal, they are "tied." To handle this, give them the average of the ranks they would have taken. For example, if two people are tied for 2nd and 3rd place, give them both rank 2.5.

Key Takeaway

Spearman’s (\(r_s\)) is for ranked or non-linear patterns. If you see a "snake-like" curve that always goes up, Spearman's will be high, even if PMCC is low.


3. Hypothesis Testing for Correlation

Once you have a correlation coefficient (\(r\) or \(r_s\)), you need to check if it's "statistically significant." We are testing to see if there is evidence of correlation in the whole population, based on our small sample.

The Hypotheses

  • For PMCC: Use the Greek letter \(\rho\) (rho).
    \(H_0: \rho = 0\) (No correlation in the population)
    \(H_1: \rho \neq 0\) (Correlation exists - two-tailed) OR \(\rho > 0\) / \(\rho < 0\) (one-tailed).

  • For Spearman's: Use \(\rho_s\).
    \(H_0: \rho_s = 0\)
    \(H_1: \rho_s \neq 0\) (or \(>\) or \(<\)).

The Critical Value

To finish the test:

1. Look up the Critical Value in the statistical tables provided in your exam. You need to know your sample size (\(n\)) and the significance level (e.g., 5%).

2. Compare: If your calculated value is greater than the critical value (ignoring any minus signs), then it is significant. You reject \(H_0\).

Important Condition for PMCC Tests

To perform a hypothesis test for the PMCC, the data must come from a bivariate normal distribution. This basically means if you plotted the data in 3D, it would look like a bell-shaped mound. You don't need to prove this in the exam, but you must state it as a requirement if asked!

Quick Review: Decision Rule

If \(|r| > \text{Critical Value}\) \(\rightarrow\) Reject \(H_0\), there IS evidence of correlation.


4. Comparing PMCC and Spearman's

Students often ask: "Which one should I use?" Here is a simple comparison to help you decide.

  • Analogy: Imagine a line of students.
    - PMCC cares about exactly how many centimeters apart they are (the distance).
    - Spearman's only cares who is in front of whom (the order).
  • The "Straightness" Test: Use PMCC for straight lines. Use Spearman's for curves.
  • The "Sensitivity" Test: PMCC is very sensitive to outliers. Spearman's is much more "robust" because ranking "squashes" extreme values down to just being the "top rank."
Key Takeaway

Always look at a scatter graph first! If it's a straight line and the data is normal, PMCC is your best friend. If it's a curve or messy, Spearman's is safer.


Common Mistakes to Avoid

  • Forgetting to rank: In Spearman's, don't use the raw numbers in the formula! You must rank them 1, 2, 3... first.
  • Confusing \(\rho\) and \(r\): Use \(r\) for your sample result and \(\rho\) when writing your hypotheses (\(H_0\) and \(H_1\)).
  • Ignoring the sign: A correlation of -0.8 is just as strong as +0.8. The minus sign only tells you the direction (downwards).
  • Correlation \(\neq\) Causation: Just because two things are correlated doesn't mean one causes the other. (Example: Ice cream sales and shark attacks are correlated because of warm weather, not because ice cream attracts sharks!)