Welcome to Continuous Probability Distributions!
In your previous studies, you’ve likely worked with Discrete Random Variables—things you can count, like the number of heads on a coin flip or the score on a die. In this chapter, we move into the world of Continuous Random Variables (CRVs). These are used for things we measure, like the time it takes for a bus to arrive, the height of a student, or the weight of an apple. Because measurements can take any value (like 1.5 minutes, 1.52 minutes, or 1.5234... minutes), we need a slightly different set of tools. Don't worry if it seems a bit "maths-heavy" at first; we'll break it down step-by-step!
1. What is a Continuous Random Variable?
A Continuous Random Variable can take any value within a given range. Unlike discrete variables where you have distinct "steps" (1, 2, 3...), a continuous variable is like a smooth slide.
The Probability Density Function (PDF)
For a CRV, we use a function called \( f(x) \), known as the Probability Density Function. This function describes the shape of the distribution.
Important Note: In a continuous distribution, the probability of the variable being exactly one specific value is always zero (\( P(X = c) = 0 \)). Instead, we always look for the probability that \( X \) falls within a range.
Key Formula:
To find the probability between two values \( a \) and \( b \), we find the area under the curve between those points using integration:
\( P(a < X \le b) = \int_{a}^{b} f(x) dx \)
The Rules for a Valid PDF:
1. The function must never be negative: \( f(x) \ge 0 \) for all \( x \).
2. The total area under the entire curve must equal 1: \( \int_{-\infty}^{\infty} f(x) dx = 1 \).
Analogy: Imagine a long loaf of bread. The total amount of bread is 1 (or 100%). If you want to know the probability of a slice, you are measuring the "area" or volume of that specific slice relative to the whole loaf.
Common Mistake to Avoid: Students often forget that for a CRV, \( P(X < 5) \) is exactly the same as \( P(X \le 5) \). Because the probability of \( X \) being exactly 5 is zero, the "equal to" sign doesn't change the total area!
Summary Takeaway: Probability in continuous distributions is simply the area under the PDF curve.
2. The Cumulative Distribution Function (CDF)
The Cumulative Distribution Function, written as \( F(x) \), tells us the probability that the random variable is less than or equal to a certain value \( x \). Think of it as the "running total" of the probability.
How to find \( F(x) \):
You integrate the PDF from the lowest possible value (often \( -\infty \) or 0) up to \( x \):
\( F(x_0) = P(X \le x_0) = \int_{-\infty}^{x_0} f(x) dx \)
The Relationship between PDF and CDF:
This is a very important "trick" for your exams!
1. To go from PDF to CDF: Integrate \( f(x) \).
2. To go from CDF to PDF: Differentiate \( F(x) \).
\( f(x) = \frac{dF(x)}{dx} \)
Quick Review Box:
- \( F(\text{lowest value}) = 0 \)
- \( F(\text{highest value}) = 1 \)
- Probability between \( a \) and \( b \): \( P(a < X < b) = F(b) - F(a) \).
Summary Takeaway: The CDF \( F(x) \) is the "accumulated area" from the left side of the graph up to point \( x \).
3. Mean, Variance, and Expected Values
Just like with discrete variables, we want to know the "average" (Mean) and the "spread" (Variance) of our data. Since we are dealing with continuous functions, we use integration instead of summation (\( \Sigma \)).
The Mean (Expected Value):
\( E(X) = \mu = \int_{-\infty}^{\infty} x f(x) dx \)
The Variance:
\( Var(X) = \sigma^2 = E(X^2) - [E(X)]^2 \)
Where \( E(X^2) = \int_{-\infty}^{\infty} x^2 f(x) dx \).
Expectation of a function \( g(X) \):
If you need to find the expected value of something like \( X^3 \) or \( 2X + 5 \), use:
\( E(g(X)) = \int_{-\infty}^{\infty} g(x) f(x) dx \)
Did you know? The mean is essentially the "center of mass" of the probability distribution. If you cut the PDF shape out of cardboard, the mean is where it would perfectly balance on your finger!
Summary Takeaway: To find the mean, integrate \( x \times f(x) \). For variance, find \( E(X^2) \) first, then subtract the mean squared.
4. Mode, Median, and Quartiles
These measures help us understand the "location" of our data in different ways.
1. The Mode:
The mode is the value of \( x \) where \( f(x) \) is at its maximum.
Step-by-step:
- Differentiate the PDF: \( f'(x) \).
- Set it to zero: \( f'(x) = 0 \).
- Solve for \( x \) (and check that it's a maximum within the range).
2. The Median (\( m \)):
The median is the "halfway point." Exactly 50% of the area is to the left and 50% is to the right.
To find it: Solve \( F(m) = 0.5 \).
3. Percentiles and Quartiles:
- Lower Quartile (\( Q_1 \)): Solve \( F(Q_1) = 0.25 \).
- Upper Quartile (\( Q_3 \)): Solve \( F(Q_3) = 0.75 \).
- \( n \)-th Percentile: Solve \( F(x) = \frac{n}{100} \).
Summary Takeaway: Mode = Highest peak. Median = Area of 0.5. Quartiles = Areas of 0.25 and 0.75.
5. Skewness
Skewness tells us if the "tail" of the distribution is pulled to one side.
- Positive Skew: The tail is on the right. Usually, \( \text{Mean} > \text{Median} > \text{Mode} \).
- Negative Skew: The tail is on the left. Usually, \( \text{Mean} < \text{Median} < \text{Mode} \).
- Zero Skew: The distribution is perfectly symmetrical. \( \text{Mean} = \text{Median} = \text{Mode} \).
Memory Aid: The skew is where the tail is. If the tail points toward the positive (right) numbers, it's positive skew. If it points toward the negative (left) numbers, it's negative skew.
Summary Takeaway: Always compare the Mean and Median to justify the skewness of a distribution in your exam.
6. The Continuous Uniform (Rectangular) Distribution
This is the simplest continuous distribution. Every value in the range \( [a, b] \) is equally likely. The PDF looks like a flat rectangle.
Key Properties for \( X \sim U(a, b) \):
- PDF: \( f(x) = \frac{1}{b-a} \) for \( a \le x \le b \).
- CDF: \( F(x) = \frac{x-a}{b-a} \) (This is just the proportion of the way through the range).
- Mean: \( E(X) = \frac{a+b}{2} \) (Exactly the middle of the range).
- Variance: \( Var(X) = \frac{(b-a)^2}{12} \).
Real-World Example: A "fair" digital stopwatch that rounds to the nearest second. The rounding error will be uniformly distributed between -0.5 and +0.5 seconds.
Quick Review Box:
For Uniform Distributions:
- The height of the PDF is always \( 1 / \text{width} \).
- The number 12 in the variance formula is a mathematical constant—don't forget it!
Summary Takeaway: The Uniform distribution is used when every outcome in a specific range has the same "density" or chance of occurring.
Don't worry if these formulas feel overwhelming! Practice drawing the sketches of the PDF and CDF; once you can visualize the area, the integration becomes much more logical. You've got this!