Introduction to Continuous Random Variables
Welcome! In your previous studies, you’ve worked with discrete random variables—things you can count, like the number of heads on a coin flip or the number of students in a class. In this chapter, we move into the world of Continuous Random Variables (CRVs). These are variables that can take any value within a range, such as time, height, or weight. Instead of a list of probabilities, we use a "smooth curve" to describe them. Don't worry if this seems a bit abstract at first; we will use calculus to unlock the patterns behind the data!
1. The Probability Density Function (pdf)
A Probability Density Function, written as \(f(x)\), is a function that describes the shape of a continuous distribution. You can think of it as a "probability hill." The higher the hill at a certain point, the more "dense" the probability is there.
The Two Golden Rules of pdfs
To be a valid pdf, a function must follow these two rules:
- Non-negative: The graph can never go below the x-axis. Mathematically: \(f(x) \ge 0\) for all \(x\).
- Total Area is 1: The total area under the entire curve must equal exactly 1. This is the continuous version of "all probabilities must sum to 1."
\(\int_{-\infty}^{\infty} f(x) dx = 1\)
Quick Tip: Most exam questions will give you a function that is only non-zero between two values (like 0 and 5). You only need to integrate between those specific limits!
Did you know? For a continuous variable, the probability of the variable being exactly one specific number (like \(P(X = 2.5)\)) is actually zero! We only measure the probability of the variable falling within a range.
Key Takeaway: Probability in a CRV is represented by area under the curve. No area = no probability!
2. Calculating Probabilities
Since probability is area, we use integration to find the chance of \(X\) falling between two values, \(a\) and \(b\).
The Formula: \(P(a < X < b) = \int_{a}^{b} f(x) dx\)
Step-by-Step: Finding a Probability
- Identify the function \(f(x)\) and the range you are interested in.
- Set up your integral with the lower limit at the bottom and the upper limit at the top.
- Integrate the function.
- Plug in your numbers and calculate the final area.
Example: If a variable has a pdf \(f(x) = \frac{1}{8}x\) for \(0 \le x \le 4\), to find \(P(1 < X < 3)\), you would calculate \(\int_{1}^{3} \frac{1}{8}x dx\).
3. Expectation (Mean) and Variance
Just like with discrete data, we want to know the "average" value and how "spread out" the data is.
Expectation \(E(X)\)
The Expectation (or mean, \(\mu\)) is the balance point of the distribution.
Formula: \(E(X) = \int_{-\infty}^{\infty} x f(x) dx\)
Variance \(Var(X)\)
The Variance measures the spread. It's often easier to calculate \(E(X^2)\) first.
Step 1: Find \(E(X^2) = \int_{-\infty}^{\infty} x^2 f(x) dx\)
Step 2: Use the variance formula: \(Var(X) = E(X^2) - [E(X)]^2\)
Memory Aid: For Variance, remember: "The Mean of the Squares minus the Square of the Mean."
4. Mode, Median, and Percentiles
Sometimes we want to find specific "cut-off" points in our data hill.
The Mode
The Mode is simply the value of \(x\) where the pdf \(f(x)\) is at its maximum. You can find this by looking at the graph or using differentiation to find the stationary point (if it's a curve).
The Median and Percentiles
The Median (\(m\)) is the value where the area to the left is exactly 0.5.
Solve for \(m\): \(\int_{-\infty}^{m} f(x) dx = 0.5\)
For the 90th percentile, you would set the integral equal to 0.9 instead of 0.5.
Key Takeaway: The median splits the area into two equal halves of 0.5 each.
5. The Cumulative Distribution Function (cdf)
The Cumulative Distribution Function, written as \(F(x)\), tells you the probability that the variable is less than or equal to a certain value \(x\).
Formula: \(F(x) = P(X \le x) = \int_{-\infty}^{x} f(t) dt\)
The Relationship Bridge
This is a vital concept for your exams:
- To get from pdf to cdf: Integrate.
- To get from cdf to pdf: Differentiate (\(f(x) = F'(x)\)).
Common Mistake to Avoid: When integrating to find the cdf, don't forget the constant of integration (+C)! You find \(C\) by knowing that at the very start of the range, the cumulative probability must be 0, and at the very end, it must be 1.
6. Special Continuous Models
The syllabus highlights two specific models you need to be comfortable with.
The Continuous Uniform (Rectangular) Distribution
This is where every value in a range \([a, b]\) is equally likely. The graph is a flat rectangle.
Key Formulae (usually given in the formula booklet):
\(E(X) = \frac{a+b}{2}\)
\(Var(X) = \frac{1}{12}(b-a)^2\)
The Normal Distribution
In this unit, we extend our A Level knowledge to linear combinations of Normal variables. If \(X\) and \(Y\) are independent Normal variables, then any combination of them (like \(X + Y\) or \(2X - 3Y\)) is also Normal.
- Mean: \(E(aX + bY) = aE(X) + bE(Y)\)
- Variance: \(Var(aX + bY) = a^2Var(X) + b^2Var(Y)\) (Note: We always add variances, even if we are subtracting the variables!)
Quick Review Box:
- pdf \(f(x)\): The "height" (Integrate for area/probability).
- cdf \(F(x)\): The "running total" of area.
- \(E(X)\): The center.
- \(Var(X)\): The spread.
Summary: Putting it All Together
When tackling a Continuous Random Variable problem, always ask yourself: "Am I looking at the pdf (the shape) or the cdf (the total so far)?" Use integration to find probabilities, means, and medians, and use differentiation to find the mode or switch from a cdf back to a pdf. Keep your integration limits clear, and remember that the total area must always be 1. You've got this!