Welcome to the World of Statistics!
Ever wondered how companies decide which new chocolate bar to launch, or how scientists know if a new medicine works? It all starts with collecting data. In this chapter, you’ll learn how to plan an investigation, identify different types of data, and choose the best ways to pick people to study. Think of this as the "detective work" phase of statistics!
1. Planning Your Investigation
Before you start counting or measuring things, you need a plan. The first step is creating a hypothesis. This is just a fancy word for a statement that you can test to see if it’s true.
Example: "As motorcycles get older, their value is likely to go down."
Challenges in the Real World
Testing a hypothesis isn't always easy. Scientists and researchers face constraints (limits) like:
• Time: You might not have 10 years to watch a motorcycle lose value.
• Cost: Traveling across the country to interview people is expensive!
• Ethics & Confidentiality: You must keep people's personal information safe and be fair to everyone involved.
• Convenience: Sometimes you have to use the data that is easiest to find, even if it’s not perfect.
Quick Review: To avoid problems later, always have a strategy for "what ifs." For example, what will you do if half the people you give a survey to just don't answer? This is called a non-response issue.
Key Takeaway: A good statistical enquiry starts with a clear, testable hypothesis while considering the time, cost, and ethics involved.
2. Understanding Different Types of Data
Statistics uses specific words to describe data. Knowing these is like learning the secret code of the subject!
Qualitative vs. Quantitative
• Qualitative Data: Described by words or labels (non-numeric). Example: Eye color (Blue, Brown, Green).
• Quantitative Data: Described by numbers. Example: Height or Weight.
Discrete vs. Continuous
Don't worry if these sound similar; here is a simple trick:
• Discrete Data: Things you count. It can only take specific values (like whole numbers). You can't have 2.5 siblings!
• Continuous Data: Things you measure. It can take any value on a scale. Example: A person’s height could be 165.23 cm.
Other Important Terms
• Categorical Data: Data that can be put into groups (e.g., Year 7, Year 8, Year 9).
• Ordinal Data: Data that has a natural order. Example: Exam grades (A, B, C) or a "Star Rating" for a movie.
• Bivariate Data: Data that involves two variables to see if there is a link between them. Example: Comparing hours spent studying with exam scores.
• Raw Data: The original data exactly as it was collected, before being organized.
Grouping Data
Sometimes we merge data into class intervals (groups like 0-10, 11-20) to make it easier to read.
Warning: While grouping makes data easier to present, you lose accuracy because you no longer know the exact original values!
Key Takeaway: Data is either Qualitative (words) or Quantitative (numbers). Quantitative data is either Discrete (counted) or Continuous (measured).
3. Explanatory and Response Variables
When looking at two variables (bivariate data), we give them special names:
1. Explanatory Variable (Independent): This is the one you think might cause a change. In a graph, this always goes on the 'x' axis (the horizontal one).
2. Response Variable (Dependent): This is the one that responds to the change. This goes on the 'y' axis (the vertical one).
Analogy: Think of a plant. The amount of water you give it is the explanatory variable. How tall it grows is the response.
4. Where Does Data Come From?
Primary vs. Secondary Data
• Primary Data: Collected by you (or your team) for your specific purpose.
Pros: You know exactly how it was collected; it's up-to-date.
Cons: Takes a lot of time and money.
• Secondary Data: Collected by someone else (like the government or a website).
Pros: Fast and often free.
Cons: Might be out of date or have errors you don't know about.
Did you know? When using secondary data, you must always acknowledge the source (say where you got it from)!
5. Population and Sampling
You usually can't ask everyone in the world a question. That’s where sampling comes in.
• Population: The entire group you are interested in (e.g., "All students in the UK").
• Sample Frame: A list of the population that you can actually pick from (e.g., "The school register").
• Sample: The small group you actually pick to study.
Sampling Methods
1. Simple Random Sampling: Every person has an equal chance of being picked. You can use a computer, a hat, or dice.
2. Systematic Sampling: Picking every \( n^{th} \) person (e.g., every 10th person on a list).
3. Quota Sampling: Picking a certain number of people from different groups (e.g., "I need 20 boys and 20 girls").
4. Opportunity (Convenience) Sampling: Picking the people who are there at the time (e.g., asking the first 10 people you see at the park). Risk: This is often biased because it doesn't represent everyone.
Stratified Sampling
This is a very fair way to sample. You divide the population into groups (strata) like Year Groups, then pick a sample from each group that is the same proportion as the population.
Example: If 60% of a school is girls, 60% of your sample should be girls.
Key Takeaway: A good sample must avoid bias so that it accurately reflects the whole population.
6. Reliability and Validity
These two words are vital for your exam:
• Reliability: If you did the test again, would you get the same results? (Think: Is the measurement consistent?)
• Validity: Does the test actually measure what it’s supposed to? (Think: Is it the right tool for the job?)
Common Mistake: A broken scales that always shows you are 5kg lighter is reliable (it gives the same wrong answer every time) but it is NOT valid (it’s not your true weight).
7. Designing Great Questions
When making a questionnaire, avoid these traps:
• Leading Questions: "Don't you agree that school dinners are great?" (This pushes people to say yes).
• Open vs. Closed Questions: Closed questions give options (Tick boxes), making data easy to analyze. Open questions let people write anything, which gives more detail but is hard to count.
Pro-tip: Always run a pilot study. This is a "mini-test" with a few people to see if your questions make sense before you send them to everyone!
8. Cleaning the Data
Before you analyze data, you must "clean" it. This means looking for:
• Outliers: Values that are much bigger or smaller than the rest (and might be a mistake).
• Missing Data: Someone skipped a question.
• Incorrect Formats: Someone wrote "ten" instead of "10".
Summary Takeaway: Collecting data is about planning carefully, sampling fairly, and cleaning your results to make sure they are reliable and valid.