Welcome to Statistical Sampling!

Ever wondered how news channels predict election results before all the votes are counted? Or how scientists decide if a new medicine works without testing it on every single person on Earth? The answer is Statistical Sampling.

In this chapter, we are going to learn how to pick a small group of people (or things) to represent a much larger group. Think of it like tasting a spoonful of soup to see if the whole pot needs more salt—you don't need to eat the whole pot to find out!

Don't worry if some of the terms feel new; we’ll break them down piece by piece.


1. Population vs. Sample

Before we can start picking groups, we need to understand what we are picking from.

Key Terms:

  • Population: The entire group of items or people that you are interested in. Example: All the students in your college.
  • Census: When you collect data from every single member of the population.
  • Sample: A subset (a smaller part) of the population used to find out information about the whole group. Example: Picking 50 students from your college to interview.
  • Sampling Unit: An individual member of the population.
  • Sampling Frame: A list of all the sampling units in the population (like a school register or a list of addresses).

The Great Debate: Census vs. Sample

Why don't we just ask everyone all the time? Here is why:

Advantages of a Census:
- It is completely accurate (it gives you the full picture).
- No one is left out.

Disadvantages of a Census:
- Time and Cost: It takes a long time and costs a lot of money to reach everyone.
- Destructive Testing: If you were testing how much pressure a glass bottle could take before breaking, you wouldn't want to test every bottle—you'd have nothing left to sell!
- Hard to process: Huge amounts of data are difficult to organize.

Advantages of a Sample:
- Quick and Cheap: Much faster to collect and analyze.
- Fewer People Needed: You don't need a massive team to gather the data.

Disadvantages of a Sample:
- Sampling Error: The data might not be perfectly representative of the whole population.
- Bias: If the sample isn't picked carefully, it could lead to the wrong conclusions.

Quick Review:

Use a Census for small groups where accuracy is vital. Use a Sample for large groups where you want to save time and money.


2. Random Sampling Techniques

If we want our sample to be fair, we often use Random Sampling. This means every member of the population has an equal chance of being picked.

Method A: Simple Random Sampling

This is the most basic form of sampling. Think of it like pulling names out of a hat.

  1. Assign a unique number to every item in the sampling frame.
  2. Use a random number generator (on your calculator or computer) to pick the numbers.

Pros: It's completely unbiased.
Cons: You need a full list of the population (sampling frame), and it can be impractical for very large populations.

Method B: Systematic Sampling

Instead of picking names randomly, you pick them at regular intervals.

Example: You want a sample of 20 people from a list of 100. You pick a starting point between 1 and 5, and then pick every \(5^{th}\) person on the list.

Pros: Very simple and quick to use.
Cons: If there is a hidden pattern in the list (e.g., every \(5^{th}\) person happens to be a manager), the sample will be biased.

Method C: Stratified Sampling

This is a very clever way to make sure different groups within a population are represented fairly. We divide the population into groups called strata (e.g., Age, Gender, or Year Group) and take a random sample from each.

To keep it fair, the number of people we pick from each group must be proportional to the size of that group in the real population.

The Formula:
\( \text{Number in sample} = \frac{\text{Number in stratum}}{\text{Number in population}} \times \text{Total sample size} \)

Pros: It guarantees that all groups are represented.
Cons: It's more complex and requires you to know the exact sizes of the sub-groups beforehand.


3. Non-Random Sampling Techniques

Sometimes, we can't get a full list of the population, or we are in a hurry. That's when we use non-random methods.

Method D: Quota Sampling

An interviewer is given a "target" number of people to find from different groups. Once the "quota" for a group is full, they stop asking those people.

Example: A researcher stands in a shopping center and is told to interview 20 men and 20 women. Once they have 20 women, they only talk to men.

Pros: You don't need a sampling frame (no list of names). It's fast and easy.
Cons: It can be biased because the interviewer chooses who to talk to (they might avoid people who look busy or grumpy!).

Method E: Opportunity (Convenience) Sampling

This is simply picking the people who are available at the time and are easy to reach.

Example: You ask the first 10 people you see in the library about their study habits.

Pros: Extremely easy and inexpensive.
Cons: Highly unlikely to be representative of the whole population.

Memory Aid: "RS-SSQO"

To remember the 5 methods, try: Random Sam Sings Songs, Quietly Often.
(Random, Systematic, Stratified, Quota, Opportunity)


4. Critique and Inference

In your exam, you might be asked to critique a sampling method. This just means "spot the flaws."

Common Mistakes to Avoid:

  • Small Sample Size: If the sample is too small, it won't represent the population well.
  • Bias: If you only interview people at a gym about their health, your results will be biased because they don't represent the general public.
  • Sampling Frame Errors: If your list is out of date, you’re starting with bad data!

Informal Inferences

When we look at our sample results, we make an inference. This is a "best guess" about the whole population based on the sample. However, always remember: different samples can lead to different conclusions. If you take two different random samples of 50 students, their average heights will likely be slightly different. This is called natural variation.

Did you know? In the 1936 US Election, a magazine polled 2.4 million people and predicted a landslide win for Alf Landon. They were wrong! Their "sampling frame" was based on car registrations and telephone directories—but in 1936, only the wealthy had those. They accidentally ignored the poor, who voted for Roosevelt.


Summary: Key Takeaways

1. Population is everyone; Sample is a small part.
2. Census is accurate but slow/expensive; Sampling is fast but carries risk of error.
3. Random methods (Simple Random, Systematic, Stratified) are generally fairer but need a list of names.
4. Non-random methods (Quota, Opportunity) are faster but more likely to be biased.
5. Always check if the sample size is big enough and if the group chosen truly represents the whole population.

Don't worry if this seems like a lot of definitions. The more you practice identifying these in "real-life" scenarios (like exam questions), the more natural it will become!