Welcome to Continuous Distributions!

In your previous statistics studies (S1), you looked at Discrete Random Variables—things you can count, like the number of heads in a coin toss or the number of goals in a match. In this chapter of Statistics 2 (S2), we move into the "smooth" world of Continuous Random Variables. These are things we measure rather than count, such as time, height, or the exact weight of a bag of sugar. Don't worry if this seems a bit abstract at first; we will use some simple calculus and clear analogies to make sense of it all!

1. The Concept of a Continuous Random Variable

A Continuous Random Variable can take any value within a given range. Unlike a discrete variable where you jump from 1 to 2, a continuous variable can be 1.5, 1.55, or 1.5555...
Analogy: Think of a digital clock that only shows minutes (Discrete) versus a stopwatch that measures time down to infinite decimals (Continuous).

Key Properties:

1. The probability of the variable taking a specific exact value is always zero: \(P(X = x) = 0\). This is because there are infinite possible values. Instead, we find the probability of being within a range (e.g., the probability a bulb lasts between 100 and 200 hours).
2. We use a Probability Density Function (PDF), denoted as \(f(x)\), to describe the distribution. The area under the graph of \(f(x)\) represents the probability.

Quick Review: For any PDF, the total area under the curve must equal 1 because the total probability of all outcomes is 100%.

2. The Probability Density Function (PDF)

The PDF, \(f(x)\), tells us how "dense" the probability is at any point \(x\). To find the probability that \(X\) falls between two values \(a\) and \(b\), we calculate the area under the curve using integration:

\(P(a < X \le b) = \int_{a}^{b} f(x) dx\)

Rules for a valid PDF:

- \(f(x) \ge 0\) for all \(x\) (You can't have negative probability density!).
- \(\int_{-\infty}^{\infty} f(x) dx = 1\).

Common Mistake to Avoid: Students often forget that \(f(x)\) is the height of the graph, not the probability itself. The area is the probability!

3. The Cumulative Distribution Function (CDF)

The Cumulative Distribution Function, denoted as \(F(x)\), represents the probability that the random variable is less than or equal to a certain value \(x\).
\(F(x_0) = P(X \le x_0) = \int_{-\infty}^{x_0} f(x) dx\)

The Relationship Between PDF and CDF:

Think of the PDF as the "rate of change" of the CDF.
- To go from PDF to CDF: Integrate.
- To go from CDF to PDF: Differentiate.
\(f(x) = \frac{dF(x)}{dx}\)

Key Takeaway: \(F(x)\) always starts at 0 and ends at 1. If you are calculating a CDF and get a value larger than 1, check your integration constants!

4. Mean, Variance, and Quartiles

Just like discrete variables, continuous ones have averages and spreads. We use the following formulas:

Mean (Expected Value):

\(E(X) = \int_{-\infty}^{\infty} x f(x) dx\)

Variance:

\(Var(X) = E(X^2) - [E(X)]^2\), where \(E(X^2) = \int_{-\infty}^{\infty} x^2 f(x) dx\)

Median and Quartiles:

The Median (m) is the value where the area to the left is exactly 0.5. To find it, solve:
\(F(m) = 0.5\)
Similarly, for the Lower Quartile (\(Q_1\)) solve \(F(Q_1) = 0.25\), and for the Upper Quartile (\(Q_3\)) solve \(F(Q_3) = 0.75\).

Mode:

The Mode is the value of \(x\) for which \(f(x)\) is at its maximum. You can often find this by looking at the graph or using differentiation to find the stationary point.

Did you know? In a perfectly symmetrical distribution, the mean, median, and mode will all be the same value!

5. The Continuous Uniform (Rectangular) Distribution

This is the simplest continuous distribution. It happens when every value in a range \([a, b]\) is equally likely. The PDF graph looks like a rectangle.

Key Formulae for \(X \sim U(a, b)\):

- PDF: \(f(x) = \frac{1}{b-a}\) for \(a \le x \le b\).
- Mean: \(E(X) = \frac{a+b}{2}\) (Right in the middle).
- Variance: \(Var(X) = \frac{(b-a)^2}{12}\).
- CDF: \(F(x) = \frac{x-a}{b-a}\) for \(a \le x \le b\).

Memory Tip: Think of a chocolate bar of length \(b-a\). If you want the average position of a bite, it's halfway along the bar!

6. Normal Approximation with Continuity Correction

Sometimes, we use the Normal Distribution to approximate discrete distributions (Binomial or Poisson) when the numbers get very large. However, because we are moving from "blocks" (discrete) to a "smooth curve" (continuous), we must use a Continuity Correction.

When to approximate:

- Binomial \(B(n, p)\): Use Normal if \(n\) is large and \(p\) is close to 0.5 (specifically \(np > 5\) and \(n(1-p) > 5\)).
- Poisson \(Po(\lambda)\): Use Normal if \(\lambda\) is large (usually \(\lambda > 10\)).

How to use Continuity Correction:

Since the Normal distribution is continuous, a discrete value like "10" is represented by the interval between 9.5 and 10.5.
- \(P(X = 10)\) becomes \(P(9.5 < Y < 10.5)\)
- \(P(X \ge 10)\) becomes \(P(Y > 9.5)\) (we want to include the block for 10).
- \(P(X > 10)\) becomes \(P(Y > 10.5)\) (we want to exclude the block for 10).

Quick Review Box:
1. Identify if an approximation is valid.
2. Calculate \(\mu\) and \(\sigma^2\).
3. Apply continuity correction (\(\pm 0.5\)).
4. Standardize using \(Z = \frac{X - \mu}{\sigma}\) and use tables.

Summary Takeaway

Continuous distributions allow us to model the real world where measurements aren't just whole numbers. By using integration for the PDF and differentiation for the CDF, you can find probabilities, means, and medians. Keep an eye on your limits of integration, and always remember that the total area must be 1. You've got this!