Welcome to the World of Confidence Intervals!
In your previous statistics work, you likely calculated a single number to estimate a population—like the average height of a student or the mean weight of an apple. We call this a point estimate. But let's be honest: one single number is rarely perfect.
In this chapter, we are moving from "guessing a single number" to "providing a range of likely values." This is what we call a Confidence Interval (CI). Think of it as a safety net. Instead of saying, "I think the average is exactly 50," you are saying, "I am 95% sure the true average is between 48 and 52."
By the end of these notes, you’ll know exactly how to build these "safety nets" for any situation, whether you have a massive dataset or just a tiny sample.
1. The Foundation: Normal Distribution with Known Variance (SH1)
This is our starting point. We use this when we know the population is normally distributed and, crucially, we already know the population variance (\(\sigma^2\)).
The Formula
To find the confidence interval for the mean (\(\mu\)), we use:
\( \bar{x} \pm z \left( \frac{\sigma}{\sqrt{n}} \right) \)
Where:
• \(\bar{x}\) is your sample mean (the middle of your interval).
• \(z\) is the critical value (how many standard deviations you go out to reach your desired confidence level).
• \(\sigma\) is the population standard deviation.
• \(n\) is the sample size.
• \(\frac{\sigma}{\sqrt{n}}\) is called the Standard Error.
Common \(z\)-values to Remember
You can find these in your formula booklet, but they are worth knowing:
• 90% Confidence: \(z = 1.645\)
• 95% Confidence: \(z = 1.960\)
• 99% Confidence: \(z = 2.576\)
Analogy: Think of the sample mean (\(\bar{x}\)) as where you stand, and the \(z \left( \frac{\sigma}{\sqrt{n}} \right)\) part as how far you can reach your arms out to either side. The more confident you want to be, the further you have to reach!
Quick Review: To make an interval narrower (more precise), you can either increase the sample size (\(n\)) or decrease the confidence level.
2. Dealing with the Real World: Large Samples with Unknown Variance (SH2)
In real life, we almost never know the true population variance (\(\sigma^2\)). If we have a large sample (usually \(n > 30\)), we can use a clever workaround.
The "Large Sample" Trick
Because the sample is large, the Central Limit Theorem tells us the distribution of the sample mean will be approximately normal. Since we don't know \(\sigma\), we use the sample standard deviation (\(s\)) as a substitute.
The formula looks almost identical:
\( \bar{x} \pm z \left( \frac{s}{\sqrt{n}} \right) \)
Don't worry if this seems like cheating! When \(n\) is large, the sample standard deviation is a very reliable "stand-in" for the population version.
Key Takeaway: Large sample (\(n > 30\)) + Unknown variance = Use \(z\) and replace \(\sigma\) with \(s\).
3. The "Cautious" Approach: Small Samples and the \(t\)-distribution (SH4)
What if your sample is tiny (e.g., \(n = 10\)) and you don't know the population variance? Using \(z\) would be too "confident"—it doesn't account for the extra uncertainty of a small sample.
This is where we use the Student’s \(t\)-distribution.
What is the \(t\)-distribution?
The \(t\)-distribution looks like the Normal distribution but has "fatter tails." This reflects the fact that with small samples, extreme values are more likely to occur by chance.
How to use it
1. Check the Degrees of Freedom (\(\nu\)): This is simply \(n - 1\).
2. Find the \(t\)-value: Look this up in your tables using your confidence level and your degrees of freedom.
3. Use the formula:
\( \bar{x} \pm t_{\nu} \left( \frac{s}{\sqrt{n}} \right) \)
Mnemonic: "T for Tiny." If the sample is Tiny and you don't know the variance, use the T-distribution.
Common Mistake to Avoid: Always remember to use \(n - 1\) for the degrees of freedom. If you have 10 data points, look up \(\nu = 9\) in the table!
4. Making Sense of it All: Making Inferences (SH3)
Constructing the interval is only half the battle. You need to explain what it means.
Scenario A: Does the interval support a claim?
If a manufacturer claims their batteries last 50 hours, and your 95% Confidence Interval is \([42, 48]\), the value 50 is outside the interval. This suggests their claim might be wrong!
Scenario B: Comparing two groups.
If you have a CI for the height of Group A \([160, 170]\) and Group B \([172, 180]\), notice they do not overlap. This gives us strong evidence that Group B is taller on average.
Did you know? A 95% Confidence Interval doesn't mean there is a 95% probability that the mean is in *that specific* interval. It means that if we took 100 different samples and made 100 intervals, we expect about 95 of them to contain the true mean.
Summary Checklist
Before you start a problem, ask yourself these three questions:
1. Do I know the population variance (\(\sigma^2\))?
• Yes: Use \(z\).
• No: Go to question 2.
2. Is my sample size (\(n\)) large?
• Yes (\(n > 30\)): Use \(z\) and substitute \(s\) for \(\sigma\).
• No: Use the \(t\)-distribution.
3. Is the underlying population normal?
• This is a requirement for small samples (\(t\)-distribution). If the question says "the population is normal," you are good to go!
Final Tip: Always write out your values for \(\bar{x}\), \(n\), and your chosen critical value (\(z\) or \(t\)) before plugging them into the formula. It prevents simple calculator errors!