Welcome to Poisson & Binomial Distributions!

In this chapter of Further Statistics 1, we are going to explore two of the most useful tools in a statistician's toolkit. While you might have met the Binomial distribution before, we are going to look at it more deeply and introduce its "cousin," the Poisson distribution.

Why does this matter? Because life is full of counting! Whether you are counting how many shooting stars appear in an hour, how many chocolate chips are in a cookie, or how many customers call a helpdesk, these distributions help us predict the future. Don't worry if this seems tricky at first; we will break it down step-by-step.

1. What is the Poisson Distribution?

The Poisson distribution is used to model the number of times an event occurs within a fixed interval of time or space. Think of it as the "counting distribution."

Prerequisite Concept: A discrete random variable is something that can only take specific values (like 0, 1, 2...), which is exactly what we use here because you can't have half an email arriving!

When can we use a Poisson Model?

For a situation to be modeled by \(X \sim Po(\lambda)\), where \(\lambda\) (lambda) is the average rate, the events must be:

  • Independent: One event happening doesn't change the chance of another happening.
  • Singly: Events cannot happen at the exact same instant.
  • Random: Events occur at a constant average rate.
  • Uniform: The probability of an event is proportional to the size of the interval.

Memory Aid: Remember the word "ISRU" (Independent, Singly, Random, Uniform) to check if Poisson is appropriate!

Real-World Analogy

Imagine you are standing by a quiet road. On average, 3 cars pass every 10 minutes. This is your average rate (\(\lambda = 3\)). You can use Poisson to calculate the probability that exactly 5 cars pass in the next 10 minutes.

The Additive Property

This is a very handy feature! If you change the interval, you change the rate proportionally.
If \(X = \) the number of events per minute and \(X \sim Po(\lambda)\):
- For 5 minutes, the distribution is \(Po(5\lambda)\).
- For 10 minutes, the distribution is \(Po(10\lambda)\).

Also, if you have two independent Poisson variables \(X \sim Po(\lambda)\) and \(Y \sim Po(\mu)\), then their sum is also Poisson: \(X + Y \sim Po(\lambda + \mu)\).

Quick Review Box:
- Poisson is for "counts" in an interval.
- \(\lambda\) is the mean number of occurrences.
- You can scale \(\lambda\) up or down based on the time or space interval.

2. Mean and Variance

One of the coolest things about these distributions is that we can predict their "center" (Mean) and their "spread" (Variance) using simple formulas.

For the Binomial Distribution \(B(n, p)\):

  • Mean: \(E(X) = np\)
  • Variance: \(Var(X) = np(1-p)\)

For the Poisson Distribution \(Po(\lambda)\):

  • Mean: \(E(X) = \lambda\)
  • Variance: \(Var(X) = \lambda\)

Did you know? In a Poisson distribution, the Mean and Variance are exactly the same! This is a great way to check if a Poisson model is a "good fit" for real data. If the mean and variance of your data are very different, Poisson might not be the best choice.

Key Takeaway: If \(E(X) \approx Var(X)\), a Poisson model is likely appropriate.

3. Using Poisson to Approximate Binomial

Sometimes, calculating Binomial probabilities is exhausting, especially if \(n\) (the number of trials) is huge. In certain cases, the Poisson distribution can "step in" and give us a very accurate shortcut.

The "Shortcut" Conditions

You can use \(Po(np)\) to approximate \(B(n, p)\) when:

  1. \(n\) is large (usually \(n > 50\)).
  2. \(p\) is small (usually \(p < 0.1\)).

In this case, we simply set our Poisson rate as \(\lambda = np\).

Example: Suppose a factory produces 1000 lightbulbs and the probability of one being faulty is 0.002. Instead of doing a complex Binomial calculation with \(n=1000\), we use \(Po(1000 \times 0.002) = Po(2)\). Much easier!

Common Mistake to Avoid: Don't use this approximation if \(p\) is large (close to 0.5). The approximation only works when the "successes" are rare!

4. Hypothesis Testing with Poisson

We can use hypothesis tests to see if the average rate (\(\lambda\)) of an event has changed. This is just like the hypothesis testing you did in A Level Maths, but with a different distribution.

The Step-by-Step Process

  1. State your Hypotheses:
    - \(H_0: \lambda = \text{original rate}\)
    - \(H_1: \lambda > \text{or} < \text{or} \neq \text{original rate}\)
  2. Identify the Test Statistic: This is the actual number of events you observed.
  3. Calculate the Probability: Use your calculator to find the probability of getting a value "at least as extreme" as your observed value, assuming \(H_0\) is true.
  4. Compare to Significance Level: If the probability is less than the significance level (e.g., 5%), reject \(H_0\).

Encouraging Phrase: Hypothesis testing is just a way of asking: "Is this result so weird that the original average must be wrong?" If the answer is "Yes," we reject \(H_0\)!

Key Takeaway: Always state your hypotheses in terms of the population parameter (\(\lambda\) or \(\mu\)).

Chapter Summary

  • Poisson Distribution: Used for independent, random events occurring at a constant rate in a fixed interval.
  • Calculators: You are expected to use your calculator's Poisson functions for both single values (\(P(X=x)\)) and cumulative values (\(P(X \le x)\)).
  • Mean & Variance: For Poisson, they are both equal to \(\lambda\). For Binomial, Mean is \(np\) and Variance is \(np(1-p)\).
  • Approximation: Use Poisson to approximate Binomial when \(n\) is large and \(p\) is small.
  • Hypothesis Testing: Focus on testing whether the rate \(\lambda\) has changed based on new evidence.