Welcome to the World of Probability Distributions!
In this chapter, we are going to learn how to move beyond simple "heads or tails" coin flips and start looking at the "big picture" of data. Probability distributions are essentially blueprints or maps that tell us how likely different outcomes are in a random process. Whether you are predicting how many people will click an ad or measuring the exact weight of a cereal box, these distributions are your most powerful tools.
Don't worry if some of the formulas look a bit "maths-heavy" at first—we’ll break them down into simple steps that anyone can follow!
1. The Language of Randomness
Before we dive into the calculations, we need to speak the same language. In Statistics, we use Random Variables (usually written as a capital letter like \(X\)) to represent an outcome that depends on chance.
Discrete vs. Continuous
This is the most important distinction you’ll make:
• Discrete Random Variables: These can only take specific, "countable" values. Think of them like a staircase—you can be on step 1 or step 2, but not step 1.5. Example: The number of goals scored in a match or the number of children in a family.
• Continuous Random Variables: These can take any value within a range. Think of them like a slide—you can be at any height at any time. Example: The time it takes to run a marathon or the exact mass of an apple.
Independence and Dependence
• Independent: If one event happening doesn't change the probability of the next one.
• Dependent: If the first outcome changes the "odds" for the second.
Quick Review:
• Random: Any variable whose value is subject to variations of chance.
• Discrete: Countable (1, 2, 3...).
• Continuous: Measurable (1.234...).
2. Working with Discrete Distributions
A discrete probability distribution is often shown in a table where we list every possible value of \(x\) and its probability \(P(X=x)\).
The Golden Rule: All probabilities in a distribution must add up to exactly 1.
\(\sum P(X=x) = 1\)
Finding the "Average" and the "Spread"
In your exam, you will often be asked to find the Expected Value (the mean) and the Variance (the spread).
The Expected Value \(E(X)\)
Think of this as the long-term average if you ran the experiment thousands of times.
The Formula: \(E(X) = \sum x P(X=x)\)
Step-by-step:
1. Multiply each value \(x\) by its probability.
2. Add all those results together. That’s your mean!
The Variance \(Var(X)\)
This tells us how much the outcomes vary from the mean.
The Formula: \(Var(X) = E(X^2) - [E(X)]^2\)
Memory Aid: Think "The Mean of the Squares minus the Square of the Mean."
Common Mistake to Avoid: Students often forget to square the \(E(X)\) at the end. Always double-check that your variance isn't a negative number—variance must be positive!
3. Continuous Distributions and the Uniform Model
For continuous data, we can't list every possible value (because there are infinite decimals!). Instead, we use a Probability Density Function (PDF). On a graph, the total area under the curve must equal 1.
The Discrete Uniform Distribution
This is the simplest model. It’s used when every outcome is equally likely.
Analogy: Rolling a fair six-sided die. Every number from 1 to 6 has the exact same probability (\(1/6\)).
The Continuous Uniform Distribution
This is often called the "Rectangular Distribution." It’s used when a variable is equally likely to be anywhere between two points, \(a\) and \(b\).
Did you know? Because the area of a rectangle is height × width, and the area must be 1, the height of a uniform distribution is always \(1 / (b - a)\).
4. Linear Combinations of Random Variables
Sometimes, we need to combine different random variables. For example, if you buy a coffee (\(X\)) and a sandwich (\(Y\)), what is the total expected cost and the total "risk" (variance)?
The Rules for Expectation
Expectation is very "friendly." It follows exactly what you’d expect:
\(E(aX \pm bY) = aE(X) \pm bE(Y)\)
Example: If you double the size of your coffee, your expected cost simply doubles.
The Rules for Variance (Be Careful!)
Variance is a bit more sensitive. Important: This formula only works if the variables are independent.
\(Var(aX \pm bY) = a^2Var(X) + b^2Var(Y)\)
Crucial Trick: Notice that even if you are subtracting variables, you add the variances. Why? Because combining two uncertain things always creates more total uncertainty (spread), never less!
Quick Review:
• When you multiply a variable by \(a\), the variance is multiplied by \(a^2\).
• Always add variances, never subtract them.
5. Choosing the Right Model
In Paper 1, you need to decide which distribution fits a real-world scenario. Here is a quick guide:
• Binomial: Use this for "Success or Failure" situations with a fixed number of trials. (e.g., Number of heads in 10 coin tosses).
• Poisson: Use this for events happening at a constant rate over time or space. (e.g., Number of emails received in an hour).
• Normal: Use this for data that clusters around a mean in a bell shape. (e.g., Heights of adult men).
• Exponential: Use this to model the time between events in a Poisson process. (e.g., The time you have to wait for the next bus).
Key Takeaways for the Exam
1. Check the Total: Always ensure your probabilities sum to 1.
2. Square the Constant: When calculating \(Var(aX)\), remember it becomes \(a^2 Var(X)\).
3. Independent Variables: You can only add variances if the variables don't affect each other.
4. Context is King: When asked to "Interpret in context," always use the units mentioned in the question (e.g., "minutes," "kg," or "calls").
Don't worry if this seems tricky at first—practice calculating \(E(X)\) and \(Var(X)\) for small tables, and the patterns will start to make sense!