Welcome to Population and Samples!

Ever wondered how pollsters predict election results by talking to only a few thousand people, or how a factory knows a batch of biscuits is perfect without eating every single one? That is the power of sampling! In this chapter, we will learn how to pick the right "spoonful" from the "pot" so that our data is accurate, fair, and useful. Don't worry if this seems like a lot of definitions at first—once you see the real-world logic, it all clicks together.

1. The Big Picture: Population vs. Sample

Before we dive into how to sample, we need to know what we are talking about. In Statistics, we use two very important words:

1. Population: The entire group of items or people you are interested in (e.g., "Every student in the UK").
2. Sample: A smaller group picked from that population (e.g., "50 students from your school").

Key Terms to Remember:

Parameter: A numerical property of a Population (like the true average height of everyone in the UK).
Statistic: A numerical property of a Sample (like the average height of the 50 students you measured). We use statistics to estimate parameters.

Quick Memory Aid:
Population = Parameter
Sample = Statistic

Quick Review Box:
A statistic is a function of the sample values only. It shouldn't contain any "unknown" values—it’s based purely on the data you’ve collected!


2. What Makes a Sample "Random"?

In the Pearson Edexcel syllabus, a random sample of size \(n\) must meet specific rules to be fair. For a sample to be truly random:

● Every member of the population must be equally likely to be included.
● All possible subsets (groups) of size \(n\) must be possible.
● Every possible sample of size \(n\) must be equally likely to occur.

Two Ways to Sample:

1. Simple Random Sampling (Without Replacement): Once you pick a person, you don't put them back in the "hat." This is the most common method in school projects.
2. Unrestricted Random Sampling (With Replacement): You pick someone, record the data, and "put them back." This means the same person could technically be picked twice!

How to actually get the numbers:

You can't just "think" of random numbers (humans are actually quite bad at being random!). Instead, use:
Random Number Tables: Grids of numbers generated by machines.
Calculator: Use the Ran# or RanInt function on your scientific calculator.

Key Takeaway: A random sample is the "Gold Standard" because it helps avoid bias (favoritism).


3. Random Sampling Techniques

Sometimes, just picking names out of a hat isn't practical. Here are the common random methods you need to know:

Simple Random Sampling

Every individual is assigned a number, and a random number generator picks the winners.
Pros: Totally unbiased; every member has an equal chance.
Cons: Can be difficult if the population is huge (like the whole world).

Systematic Sampling

You pick a starting point at random and then take every \(k^{th}\) member (e.g., every 10th person on a list).
Example: Testing every 50th lightbulb on an assembly line.
Pros: Very simple and quick to use.
Cons: If there is a "pattern" in the list, it might cause bias.

Stratified Sampling

The population is divided into groups called strata (e.g., Year 12 and Year 13). You then take a random sample from each group.
Proportional Stratification: If Year 12 is twice as big as Year 13, your sample will have twice as many Year 12s. This is very representative!
Disproportional Stratification: You might take more people from a tiny group just to make sure you have enough data to talk about them.

Cluster Sampling

The population is divided into groups (clusters) that are similar to each other (e.g., different streets in a town). You pick a few clusters at random and sample everyone inside them.
Analogy: Imagine a box of KitKats. Each bar is a "cluster." To taste the recipe, you pick two whole bars at random and eat all the fingers in those bars.


4. Non-Random Sampling Techniques

Sometimes random sampling is impossible or too expensive. Here are the alternatives:

Judgmental Sampling

The researcher uses their own "expert judgment" to pick who should be in the sample.
Risk: It is highly likely to be biased because it depends on one person's opinion.

Snowball Sampling

You find one person, and they "refer" you to their friends, who refer you to their friends.
Did you know? This is used for hard-to-reach populations. For example, if you wanted to study illegal drug users or people with a very rare hobby, you wouldn't have a list of names. You'd find one person and ask them to introduce you to others.

Quota Sampling (A type of non-random sampling)

Similar to stratified, but you just go out and find people until you hit your "quota" (e.g., "I need 20 men and 20 women"). You don't pick them randomly; you might just stop people on the street who look like they fit the description.

Quick Review Box:
Random = No human choice involved (the "hat" decides).
Non-Random = Human choice or circumstances decide.


5. Choosing the Right Method and Practical Constraints

In your exam, you might be asked why a researcher chose a specific method. Always think about these three constraints:

1. Cost: Is it too expensive to travel all over the country?
2. Time: Do we need the results today (e.g., an Exit Poll during an election)?
3. The Sampling Frame: Do we even have a list of everyone? If there is no list, you cannot do Simple Random Sampling.

Common Pitfalls to Avoid:

Selection Bias: If you only sample people at a gym, you can't claim your results represent the "whole town's fitness."
Non-Response: If you send 100 surveys and only 5 people reply, those 5 people might have very extreme opinions, which ruins your data.

Example Scenarios:

Market Research: Often uses Quota or Stratified sampling to ensure they hear from different ages/genders.
Quality Assurance: Often uses Systematic sampling (every 100th item) on a production line.
Exit Polls: Uses Cluster sampling (picking specific polling stations) to get quick results on election day.

Key Takeaway: No sampling method is perfect. The goal is to choose the one that provides the least bias for the lowest cost and time.


Summary Checklist

● Can I explain the difference between a population and a sample?
● Do I know the difference between a parameter and a statistic?
● Can I list the three requirements for a random sample?
● Do I understand the "Special Use" for Snowball sampling?
● Can I explain why Stratified sampling is usually more representative than Simple Random sampling?

Don't worry if you need to read this a few times! Sampling is about logic. Just keep asking yourself: "If I did this in real life, would it be fair?"