Welcome to Simulation!
Ever wondered how companies predict how long a queue will be, or how mathematicians figure out the chance of winning a complex game without playing it millions of times? The answer is Simulation. In this chapter of the Statistics Major section, we move away from just using formulas and instead use technology to "copy" real-life events. It’s like a flight simulator for math!
Simulation is a powerful tool because it allows us to tackle problems that are too "messy" or technically difficult to solve with standard theoretical equations. Let’s dive in!
1. What is Simulation and Why Use It?
At its heart, simulation is using a model to mimic the behavior of a real-world system over time. We use it when the theoretical math is either impossible to do or just really, really hard.
Analogy: Imagine you want to know if a specific paper airplane design will fly 10 meters. You could use complex physics equations (theory), or you could just throw it 500 times and record the results (simulation). In this course, we use spreadsheets to do those "500 throws" for us in a split second!
Key Terms to Know:
- Trial: A single "run" of the simulation (like one toss of a coin).
- Relative Frequency: The proportion of times an event happens during your simulation. As the number of trials increases, this gets closer to the theoretical probability.
- Variation: The idea that every time you run a simulation, the results will be slightly different because of randomness.
Quick Review: Simulation doesn't give us an "exact" answer like a formula does; it gives us an estimate that gets better the more trials we run.
2. Simulating Distributions with Spreadsheets (SZ1)
The syllabus requires you to know how to simulate three main types of distributions using software like Excel or Google Sheets.
A. Discrete Uniform Distribution
This is when you have a set number of outcomes, and each is equally likely (like a fair die). In a spreadsheet, we use:
=RANDBETWEEN(lower, upper)
Example: To simulate a 6-sided die, you would type \( =RANDBETWEEN(1, 6) \).
B. Continuous Uniform Distribution
This is for random numbers between two values where any decimal is possible. Usually, we start with numbers between 0 and 1 using:
=RAND()
Did you know? Most computer "randomness" is actually called pseudo-randomness because it's generated by a clever algorithm, but for our statistics work, it’s plenty random enough!
C. Normal Distribution
Simulating a Normal distribution is a bit trickier but very common in exams. We use a formula that takes a random probability (between 0 and 1) and turns it into a value from a Normal curve:
=NORM.INV(RAND(), \(\mu\), \(\sigma\))
Note: In some software, it might ask for the standard deviation \( \sigma \) and in others, the variance \( \sigma^2 \). Always read the question carefully! For MEI, it's usually the standard deviation.
Key Takeaway: Spreadsheets use RAND() as the engine for almost all simulations. Don't worry if your numbers look different from your friend's—that's just variation in action!
3. Using Simulations to Solve Difficult Problems (Z2)
Sometimes the "theory" is too hard. For example, if you want to find the probability of the sum of three different distributions, the math can get very complicated. Simulation makes it easy.
The "Train Waiting Time" Example
Imagine you commute to work. Trains run every 15 minutes. You arrive at the station at a random time. What is the probability that you wait for more than 20 minutes in total for your morning and evening trains combined?
Theoretical approach: This involves integrating functions over a 2D plane. Yikes!
Simulation approach:
- Let \( X \) be the wait time in the morning: \( =15 \times RAND() \)
- Let \( Y \) be the wait time in the evening: \( =15 \times RAND() \)
- Calculate the total: \( T = X + Y \)
- Repeat this for 1,000 rows.
- Count how many rows have \( T > 20 \) and divide by 1,000.
Investigating the Central Limit Theorem (CLT)
The CLT says that if you take the mean of a large enough sample, that mean will follow a Normal distribution, even if the original data wasn't Normal. You can simulate this by:
- Generating 10 random numbers from a Uniform distribution.
- Calculating their average.
- Repeating this 500 times.
- Plotting a histogram of the 500 averages. You'll see a beautiful bell curve!
4. Interpreting Spreadsheet Output
In your exam, you might not have to use a spreadsheet, but you will have to interpret one. You might see a table of values or a summary of results.
Common Mistake to Avoid: Thinking that a simulation with 10 trials is "proof" of a probability. 10 trials is far too few! You need hundreds or thousands of trials to reduce the effect of random variation.
Example Exam Task:
"A simulation of 1,000 trials for the sum of two dice resulted in a sum of '7' appearing 162 times. Compare this to the theoretical probability."
How to answer:
1. Theoretical probability of a 7 is \( \frac{6}{36} = 0.1667 \).
2. Simulation relative frequency is \( \frac{162}{1000} = 0.162 \).
3. Conclusion: The simulation result is very close to the theoretical value, but differs slightly due to random variation.
5. Summary and Key Takeaways
The Goal: To estimate probabilities and model behaviors when formulas are too hard.
The Tools:
- RAND() for continuous uniform \([0, 1]\).
- RANDBETWEEN() for discrete uniform.
- NORM.INV() for Normal distributions.
The Mindset:
- More trials = better estimates.
- Simulation is an approximation, not a perfect "truth."
- Variation is expected—if you refresh your spreadsheet, the numbers should change!
Don't worry if this seems a bit abstract! Just remember: Simulation is just "doing the thing" thousands of times on a computer to see what happens. If you can describe the steps to simulate a scenario, you've mastered the biggest part of this chapter!