Welcome to Sampling and Estimation!
Ever wondered how news channels predict election results before all the votes are counted? Or how a lightbulb factory knows how long their bulbs last without burning every single one of them out? The answer is Sampling and Estimation! In this chapter, we learn how to look at a small group (a sample) to make a very smart guess about a huge group (the population). Don't worry if this seems a bit abstract at first; we will break it down step-by-step.
1. Populations and Samples: The Big Picture
Before we dive into the math, let's get our definitions straight.
• Population: The entire group you are interested in (e.g., every student in the world taking Cambridge A-Levels).
• Sample: A smaller group chosen from that population (e.g., 50 students from your school).
• Parameter: A value that describes the whole population (like the true average height of everyone). Usually, we don't know this!
• Statistic: A value calculated from your sample (like the average height of those 50 students). We use this to guess the parameter.
Analogy: Think of a giant pot of soup. The whole pot is the population. You take one spoonful to taste—that’s your sample. If that spoonful is salty, you estimate that the whole pot is salty.
2. Unbiased Estimators
Since we usually don't know the population mean \( \mu \) or the population variance \( \sigma^2 \), we have to estimate them using our sample data. An unbiased estimator is a formula that, on average, gives us the correct population value.
Estimating the Population Mean (\( \mu \))
The best way to estimate the population mean is simply to use the sample mean \( \bar{x} \). It is an unbiased estimator of \( \mu \).
Formula: \( \bar{x} = \frac{\sum x}{n} \)
Estimating the Population Variance (\( \sigma^2 \))
This is where it gets a little tricky! If you just use the standard variance formula from your earlier studies (dividing by \( n \)), you will usually underestimate the true population variance. To fix this and make it "unbiased," we divide by \( n-1 \) instead. We call this unbiased estimate \( s^2 \) or \( \hat{\sigma}^2 \).
The Unbiased Variance Formula:
\( s^2 = \frac{1}{n-1} \left( \sum x^2 - \frac{(\sum x)^2}{n} \right) \)
Quick Review:
• To estimate the mean: Divide by \( n \).
• To estimate the variance: Divide by \( n-1 \).
• Common Mistake: Forgetting to subtract 1 from the denominator when calculating the variance estimate!
3. The Central Limit Theorem (CLT)
This is the "magic" part of statistics. Imagine you take many different samples from a population and calculate the mean of each sample. These sample means will form their own distribution, called the Sampling Distribution of the Mean.
The Rule: If your population has a mean \( \mu \) and a variance \( \sigma^2 \), then the distribution of the sample mean \( \bar{X} \) will have:
1. The same mean: \( E(\bar{X}) = \mu \)
2. A smaller variance: \( Var(\bar{X}) = \frac{\sigma^2}{n} \)
What if the population isn't Normal?
This is where the Central Limit Theorem comes in. Even if the original population is shaped like a "U," a "J," or is totally wonky, as long as your sample size \( n \) is large (usually \( n \ge 30 \)), the distribution of the sample means will automatically look like a Normal Distribution!
Key Takeaway: For large \( n \), \( \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \).
Did you know? The larger your sample size, the thinner and "taller" the distribution of the mean becomes. This means a larger sample gives you a much more reliable estimate!
4. Confidence Intervals for the Population Mean
Instead of just giving one single number as an estimate (like "the average is 50"), it’s often better to give a range (like "I am 95% sure the average is between 48 and 52"). This range is called a Confidence Interval (C.I.).
How to Calculate a Confidence Interval
The formula for a Confidence Interval is:
\( \bar{x} \pm z \times \frac{\sigma}{\sqrt{n}} \)
Step-by-Step Process:
1. Find your sample mean (\( \bar{x} \)).
2. Identify the population standard deviation (\( \sigma \)). If you don't know it, use your unbiased estimate \( s \).
3. Choose your "Critical Value" (\( z \)) based on the confidence level:
• For a 95% C.I., use \( z = 1.96 \)
• For a 99% C.I., use \( z = 2.576 \)
4. Calculate the "Standard Error": \( \frac{\sigma}{\sqrt{n}} \).
5. Multiply \( z \) by the Standard Error to get the "Margin of Error."
6. Add and subtract this from \( \bar{x} \) to get your interval.
Example: If \( \bar{x} = 100 \), \( \sigma = 15 \), \( n = 36 \), and we want a 95% C.I.:
Standard Error = \( 15 / \sqrt{36} = 2.5 \)
Margin of Error = \( 1.96 \times 2.5 = 4.9 \)
Interval = \( (100 - 4.9) \) to \( (100 + 4.9) = [95.1, 104.9] \).
Important Note: If the sample size increases, the width of the interval decreases (it gets more precise). If the confidence level increases (e.g., from 95% to 99%), the width increases (you need a wider net to be more sure).
5. Confidence Intervals for Proportions
Sometimes we aren't measuring a value (like height), but a proportion (like the percentage of people who like chocolate). We call the sample proportion \( p_s \) or \( \hat{p} \).
The Formula:
\( \hat{p} \pm z \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)
This works the exact same way as the mean interval, just with a different formula for the standard error!
Quick Review Box:
• Standard Error of the Mean: \( \frac{\sigma}{\sqrt{n}} \)
• Standard Error of the Proportion: \( \sqrt{\frac{p(1-p)}{n}} \)
• Use \( z = 1.96 \) for 95% confidence.
• Use \( z = 2.576 \) for 99% confidence.
Summary: Common Pitfalls to Avoid
• Mixing up \( n \) and \( \sqrt{n} \): Always remember to square root the sample size in the denominator of your error formulas!
• Confusing \( \sigma \) and \( \sigma^2 \): Read the question carefully to see if they gave you the standard deviation or the variance.
• Using CLT when not needed: If the population is already Normal, \( \bar{X} \) is Normal for any sample size. If the population is not Normal, you must have \( n \ge 30 \) to use the Normal distribution formulas.
You've finished the notes for Sampling and Estimation! Take a deep breath. Start by practicing the unbiased variance calculations, then move on to the "magic" of the Central Limit Theorem. You've got this!