Welcome to the World of Relationships!

In Statistics, we often want to know if two things are connected. Does the amount of time you spend gaming affect your exam scores? Does the temperature outside change how many ice creams are sold? This chapter is all about bivariate data—which is just a fancy way of saying "data with two variables."

By the end of these notes, you’ll be able to spot patterns, draw lines that predict the future, and understand why just because two things happen together, it doesn't mean one caused the other!

1. The Basics: Bivariate Data

Bivariate data involves pairs of measurements. Each "subject" gives us two pieces of information (like a person's height and their weight).

To see these relationships, we use a Scatter Diagram. But before we plot points, we need to know which axis is which:

  • Explanatory Variable (Independent): This is the one we think might be "explaining" the change. It always goes on the \(x\)-axis (the horizontal one).
  • Response Variable (Dependent): This is the "outcome" we are measuring. It always goes on the \(y\)-axis (the vertical one).

Example: If you are investigating if 'hours of revision' affects 'test scores', the hours of revision is the explanatory variable (\(x\)) and the test score is the response variable (\(y\)).

Quick Review: Remember \(x\) comes before \(y\) in the alphabet, just like the Explanatory cause comes before the Response!

2. Describing the Relationship: Correlation

When we look at a scatter diagram, we are looking for correlation—a description of how the points are grouped together.

Types of Correlation

  • Positive Correlation: As \(x\) goes up, \(y\) goes up. The points trend upwards from left to right (like a hill you are climbing).
  • Negative Correlation: As \(x\) goes up, \(y\) goes down. The points trend downwards from left to right (like a slide).
  • Zero Correlation: The points are scattered everywhere like a cloud. There is no clear connection.

Strength of Correlation

We also describe how "neat" the pattern is:

  • Strong: The points are very close to forming a straight line.
  • Weak: You can see the general direction, but the points are quite spread out.

Did you know? Correlation is a bit like a friendship. A "strong" correlation means the two variables are "best friends" and always follow each other closely!

3. Correlation vs. Causation

This is a favorite exam topic! Just because two things have a correlation, it does not mean one caused the other. This is called Association.

Example: There is a positive correlation between ice cream sales and shark attacks. Does eating ice cream cause sharks to bite? No! Both are caused by a third factor: warm weather.

When a correlation is accidental or caused by something else, we call it spurious correlation.

Key Takeaway: Correlation shows a link, but it doesn't prove "Product A" causes "Result B."

4. The Line of Best Fit

A line of best fit is a straight line drawn through the middle of the points to show the general trend. You can use it to make predictions.

How to draw it accurately:

  1. Calculate the Double Mean Point. This is the point \((\bar{x}, \bar{y})\), where \(\bar{x}\) is the mean of all \(x\) values and \(\bar{y}\) is the mean of all \(y\) values.
  2. Your line must pass through this double mean point \((\bar{x}, \bar{y})\).
  3. Try to have an equal number of points above and below the line.

Making Predictions

  • Interpolation: Predicting a value inside the range of data you already have. This is usually very reliable!
  • Extrapolation: Predicting a value outside the range of your data (e.g., predicting the price of a 100-year-old car when your data only goes up to 10-year-old cars). Warning: This is risky and often inaccurate because trends can change!

Common Mistake: Students often try to force the line of best fit through the origin \((0,0)\). Only do this if it actually fits the data and makes sense for the context!

5. Measuring Correlation (Higher Tier)

While Foundation students describe correlation in words, Higher students use numbers between -1 and +1.

Spearman’s Rank Correlation Coefficient

This measures how well the ranks (the order) of two variables agree.
\(+1\) = Perfect agreement in ranks.
\(-1\) = Perfect opposite ranks.
\(0\) = No agreement at all.

The formula (which will be given to you in the exam) is:
\( r_s = 1 - \frac{6 \sum d^2}{n(n^2 - 1)} \)
Where \(d\) is the difference between the ranks and \(n\) is the number of pairs.

Pearson’s Product Moment Correlation Coefficient (PMCC)

This measures the strength of a linear (straight line) relationship.
+1: Perfect positive linear correlation.
-1: Perfect negative linear correlation.
0: No linear correlation.

Spearman’s vs. PMCC: What's the difference?

  • PMCC is for straight lines only.
  • Spearman’s is for any relationship where one goes up and the other goes up (even if it's a curve!).
  • Example: If data follows a curve, Spearman’s might be very high (near \(+1\)), but PMCC might be lower because it's not a perfect straight line.

Key Takeaway: The closer a value is to \(+1\) or \(-1\), the stronger the correlation. Values near \(0\) are weak.

Summary: Scatter Diagrams at a Glance

  • Plotting: \(x\) is Explanatory, \(y\) is Response.
  • Correlation: Can be positive, negative, or zero; strong or weak.
  • Causation: Correlation does not mean one thing causes another!
  • Line of Best Fit: Must pass through the mean point \((\bar{x}, \bar{y})\).
  • Predictions: Interpolation is safe; Extrapolation is "danger zone."
  • Coefficients: \(-1\) (negative) to \(+1\) (positive). \(0\) is nothing.

Don't worry if the formulas look scary at first—focus on the "story" the graph is telling you, and the math will follow!