Introduction to Sampling

Welcome to the chapter on Sampling! Have you ever wondered how news channels predict election results before all the votes are counted? Or how a chef knows if a giant pot of soup is seasoned correctly just by tasting one small spoonful? That, in a nutshell, is the power of sampling.

In this chapter, we will learn how to look at a small part of a group (a sample) to make smart guesses about the whole group (the population). This is a foundational skill in statistics that helps us handle huge amounts of data efficiently.

1. Population and Simple Random Samples

Before we dive into the math, let's get our definitions straight. These are the building blocks of everything that follows.

What is a Population?

The population is the entire set of items or individuals that we are interested in studying. For example, if we want to know the average height of students in your school, every single student in that school is part of the population.

What is a Sample?

A sample is a subset or a small part of the population. Since it is often too expensive, time-consuming, or impossible to measure everyone in a population, we take a sample instead.

The Simple Random Sample (SRS)

For our statistics to be fair, every member of the population must have an equal chance of being selected. This is called a Simple Random Sample. Think of it like drawing names out of a well-shaken hat!

Did you know? If you sample a soup but don't stir it first, you might only taste the cream on top. In statistics, "stirring the soup" is like ensuring your sample is truly random and representative!

Key Takeaway:

A population is the "whole," and a sample is the "part." For the sample to be useful, it must be randomly selected.

2. The Sample Mean (\(\bar{X}\)) as a Random Variable

This is where things get interesting! Suppose you take a random sample of 10 students and calculate their average height. Then, your friend takes a different random sample of 10 students. Will your averages be the same? Probably not!

Because the value of the sample mean changes depending on which specific individuals end up in the sample, we treat the sample mean (\(\bar{X}\)) as a random variable.

The Mean and Variance of \(\bar{X}\)

Even though the sample mean varies, it follows some very specific rules. If the original population has a mean of \(\mu\) and a variance of \(\sigma^2\):

1. Expectation of the Sample Mean: \(E(\bar{X}) = \mu\)
(On average, your sample mean will be equal to the true population mean.)

2. Variance of the Sample Mean: \(Var(\bar{X}) = \frac{\sigma^2}{n}\)
(As your sample size \(n\) gets bigger, the "spread" or uncertainty of your sample mean gets smaller. This makes sense: a bigger sample is more reliable!)

Key Takeaway:

The average of all possible sample means is the population mean, but the spread of those means shrinks as the sample size increases.

3. The Distribution of the Sample Mean

How do we know the "shape" of the distribution for \(\bar{X}\)? It depends on the population.

Case 1: Sampling from a Normal Population

If the original population is already normally distributed, represented as \(X \sim N(\mu, \sigma^2)\), then the sample mean is always normally distributed, regardless of the sample size.

We write this as: \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\)

Case 2: The Central Limit Theorem (CLT)

What if the original population is not normal? (Maybe it's skewed or just weirdly shaped). Don't worry! This is where the "magic" of statistics happens.

The Central Limit Theorem states that if your sample size \(n\) is sufficiently large (usually \(n \ge 30\)), the distribution of the sample mean \(\bar{X}\) will be approximately normal, even if the population isn't!

Condition: \(n \ge 30\)
Result: \(\bar{X} \approx N(\mu, \frac{\sigma^2}{n})\)

Analogy: Imagine many people throwing handfuls of random colored sand. Even if the individual grains are scattered randomly, if you throw enough of them, they tend to form a nice, smooth "bell curve" pile in the middle.

Quick Review:

- If population is Normal \(\rightarrow \bar{X}\) is Normal (any \(n\)).
- If population is NOT Normal \(\rightarrow \bar{X}\) is approximately Normal if \(n \ge 30\) (CLT).

4. Unbiased Estimates of Population Parameters

In the real world, we often don't know the true population mean (\(\mu\)) or variance (\(\sigma^2\)). We have to estimate them using our sample data.

Unbiased Estimate of the Population Mean (\(\mu\))

The best estimate for the population mean is simply the mean of your sample.

\(\text{Unbiased estimate } \hat{\mu} = \bar{x} = \frac{\sum x}{n}\)

Unbiased Estimate of the Population Variance (\(\sigma^2\))

This one is a bit tricky! You might think you just use the sample variance formula, but that actually tends to underestimate the true population variance. To fix this, we use \(n-1\) instead of \(n\) in the denominator. We call this unbiased estimate \(s^2\).

Formula from raw data:
\(s^2 = \frac{1}{n-1} \left( \sum x^2 - \frac{(\sum x)^2}{n} \right)\)

Formula from summarized data (using a constant \(a\)):
Sometimes the exam gives you data shifted by a value \(a\). Don't panic! The variance doesn't change when you shift data. Use this version:
\(s^2 = \frac{1}{n-1} \left( \sum (x-a)^2 - \frac{(\sum (x-a))^2}{n} \right)\)

Common Mistake to Avoid: Don't forget the \(n-1\)! If you divide by \(n\), you are finding the sample variance. If you divide by \(n-1\), you are finding the unbiased estimate of the population variance. For H1 Math, we almost always want the unbiased estimate!

Key Takeaway:

To estimate the population variance correctly, we use the \(s^2\) formula with the \(n-1\) "correction factor."

5. Summary and Tips for Success

Sampling might feel abstract, but it's just about using small pieces of info to see the big picture. Here is a quick checklist for solving problems:

  • Identify the Population: Is it Normal? If not, is \(n \ge 30\)? (If yes, use CLT).
  • Check your Variance: Are you given the population variance \(\sigma^2\), or do you need to calculate the unbiased estimate \(s^2\)?
  • Watch the formulas: Remember that the variance of the mean is \(\frac{\sigma^2}{n}\). People often forget to divide by \(n\)!
  • Read the question carefully: Does it ask for the distribution of one item (\(X\)) or the mean of many items (\(\bar{X}\))?

Don't worry if this seems tricky at first! The more you practice identifying whether to use the population variance or the sample mean variance, the more natural it will feel. You've got this!