Introduction to the Normal Distribution

Welcome to one of the most important chapters in Statistics! Have you ever noticed that in a large group of people, most are around an average height, while very few are extremely tall or extremely short? Or that most students score near the average in an exam? This "middle-heavy" pattern happens so often in nature and social sciences that mathematicians created a model for it called the Normal Distribution.

Don't worry if this seems a bit abstract at first. By the end of these notes, you’ll be able to describe this famous "bell curve" and use your calculator to solve problems that once took hours of manual calculation!

1. What is the Normal Distribution?

The Normal Distribution is a continuous probability distribution. This means it deals with data that can take any value, like time, weight, or height (unlike the Binomial distribution, which counts "successes" like 1, 2, or 3).

The Notation

We write that a random variable \(X\) follows a normal distribution as:
\(X \sim N(\mu, \sigma^2)\)

Breaking this down:
1. \(\mu\) (mu): This is the mean (average). It tells you where the center of the bell curve sits.
2. \(\sigma^2\) (sigma squared): This is the variance.
3. \(\sigma\) (sigma): This is the standard deviation. It tells you how "stretched" or "squashed" the curve is.

Quick Review: In your exam, always check if the question gives you the variance (\(\sigma^2\)) or the standard deviation (\(\sigma\)). If you are given \(\sigma^2 = 16\), then \(\sigma = 4\).

Key Features of the Bell Curve

  • Symmetry: The curve is perfectly symmetrical around the mean (\(\mu\)). The left side is a mirror image of the right.
  • Mean = Median = Mode: All three measures of central tendency sit right in the middle.
  • Total Area = 1: Because the area under the curve represents the total probability of all possible outcomes, it must equal 1.
  • Points of Inflection: These are the points where the curve changes from "curving down" to "curving out." These occur exactly at \(x = \mu + \sigma\) and \(x = \mu - \sigma\).

Did you know? Because the curve is symmetrical, exactly 50% of the data is above the mean, and 50% is below the mean. This is a "lifesaver" trick for quick mental checks!

Key Takeaway: The Normal Distribution describes data that clusters around a central mean, spreading out symmetrically into a bell shape.

2. The "68-95-99.7" Rule

This is a handy rule of thumb to help you visualize how data is spread out in a normal distribution. In every normal distribution:

  • About 68% (roughly two-thirds) of the data lies within 1 standard deviation of the mean (\(\mu \pm \sigma\)).
  • About 95% of the data lies within 2 standard deviations of the mean (\(\mu \pm 2\sigma\)).
  • Almost all (99.7%) of the data lies within 3 standard deviations of the mean (\(\mu \pm 3\sigma\)).

Memory Aid: Think of it as the 1-2-3 Rule. 1 step away = 68%, 2 steps = 95%, 3 steps = nearly everyone!

Key Takeaway: If a value is more than 3 standard deviations away from the mean, it is very rare (an outlier)!

3. The Standard Normal Distribution (\(Z\))

Imagine trying to compare heights in centimeters with weights in kilograms. It’s impossible! To solve this, we use the Standard Normal Distribution, which has a mean of 0 and a standard deviation of 1.

We use the letter \(Z\) to represent this: \(Z \sim N(0, 1)\).

The Z-Transformation Formula

You can turn any normal value (\(X\)) into a standard value (\(Z\)) using this "universal translator" formula:
\(Z = \frac{X - \mu}{\sigma}\)

Example: If IQ scores are \(N(100, 15^2)\), what is the Z-score for someone with an IQ of 130?
\(Z = \frac{130 - 100}{15} = 2\).
This means that person is exactly 2 standard deviations above the average.

Common Mistake: Forgetting to square root the variance! If the distribution is \(N(50, 25)\), \(\sigma\) is 5, not 25. Always use \(\sigma\) in the bottom of the Z-formula.

Key Takeaway: The Z-score tells you how many standard deviations a value is away from the mean.

4. Using Your Calculator

In the OCR A Level course, you are expected to use your calculator’s statistical functions rather than old-fashioned tables.

A. Finding Probabilities (Normal CD)

Use this when you have a range of values (e.g., \(P(X < 55)\) or \(P(40 < X < 60)\)) and want to find the probability (the area under the curve).

  • Lower Bound: The smallest value in your range. If there is no lower bound (e.g., \(X < 55\)), use a very small number like \(-9999\).
  • Upper Bound: The largest value in your range. If there is no upper bound (e.g., \(X > 70\)), use a very large number like \(9999\).

B. Finding Values (Inverse Normal)

Use this when you know the probability (the "area") and want to find the value (\(x\)).
Example: "Find the height exceeded by the tallest 10% of people."

Tip: Most calculators require the "Area" to be the area to the left of the value. If you want the top 10%, you must input an area of 0.90 (the bottom 90%).

Key Takeaway: Probability = "Normal CD". Value = "Inverse Normal".

5. Choosing the Right Model

Sometimes you have to decide if a Normal Distribution is appropriate for a real-world context.

When the Normal Model is Good:

  • The data is continuous.
  • The data is symmetrical and bell-shaped.
  • Most data points are near the middle.

Approximating the Binomial Distribution:

If you have a Binomial distribution \(X \sim B(n, p)\) where \(n\) is large (usually \(n > 50\)) and \(p\) is close to 0.5, the bar chart of the Binomial distribution starts to look exactly like a smooth Normal curve!

In these cases, we can use:
Mean: \(\mu = np\)
Variance: \(\sigma^2 = np(1-p)\)

Note: You won't be asked to do long calculations for this approximation in this specific paper, but you must understand the logic of when one distribution can be used to model another.

Quick Summary Review:
1. Notation: \(X \sim N(\mu, \sigma^2)\).
2. Shape: Symmetrical, total area = 1, mean at the center.
3. Z-score: \(Z = (X - \mu) / \sigma\).
4. 68-95-99.7: The spread of data in 1, 2, and 3 standard deviations.
5. Calculator: Use 'Normal CD' for area/probability and 'Inverse Normal' for values.