Lesson: Statistics for Grade 12 Students

Hello, Grade 12 students! Welcome to our lesson on Statistics. If you've ever wondered how election polls are so accurate or how music streaming apps know exactly what kind of songs you like, the answers are mostly hidden within statistics.

In this chapter, we won’t just be crunching numbers; we’ll learn how to "summarize data" to uncover the meanings hidden within. If this subject feels difficult at first, don't worry! We will take it step-by-step with real-life examples that you can easily relate to.

1. Getting to Know Data

Before we start calculating, we need to know that there are two main types of data:

1. Qualitative Data: Describes characteristics, such as favorite color, gender, car brand, or opinion (good/bad). You can't perform arithmetic operations on this type of data.
2. Quantitative Data: Expressed as numbers that can be used for calculation, such as height, weight, test scores, or food prices.

Key Point: In the Grade 12 curriculum, we will focus mainly on analyzing Quantitative Data.

2. Measures of Central Tendency

The measure of central tendency is the "representative" of a data set. There are three main values you need to know:

(1) Arithmetic Mean: This is the total sum of all data divided by the number of items.
Formula: \( \bar{x} = \frac{\sum x}{n} \)
Example: If three friends have 10, 20, and 60 Baht, the mean is \( (10+20+60)/3 = 30 \) Baht.

(2) Median: The "middle person" when you line everyone up from smallest to largest.
Caution: You must always sort the data first! If the number of data points is even, take the two middle values, add them together, and divide by 2.

(3) Mode: The most "popular" data point or the one that appears most frequently.

Did you know?
If there is an "Outlier" in the group—for example, if everyone earns 20,000 but one person earns 1 million—using the Mean will make the data look distorted. In this case, using the Median provides a better representation of the group!

3. Measures of Position

When we have a large amount of data, we want to know, "Where do I stand compared to others?" For this, we use Percentile (\( P_r \)).

A percentile divides sorted data into 100 equal parts.
- If your score is at the \( P_{80} \), it means 80% of people scored lower than you, and 20% performed better.

Steps to find the position:
1. Sort the data from lowest to highest (never forget this!).
2. Find the position using the formula: \( L = \frac{r}{100}(n + 1) \)
(Where \( r \) is the desired percentile and \( n \) is the total number of data points.)

4. Measures of Dispersion

While the central tendency tells us "where the middle is," measures of dispersion tell us "how clustered or spread out the data is."

(1) Range: The simplest one; it's the Maximum value - Minimum value.
(2) Standard Deviation (S.D.): Tells us how far each data point deviates from the mean on average.
- Low S.D.: Data is closely clustered (everyone has similar scores).
- High S.D.: Data is widely spread out (there are both very high and very low scorers).

Key Point: The formulas for population S.D. (\( \sigma \)) and sample S.D. (\( s \)) differ by the divisor (\( n \) vs. \( n-1 \)). Pay close attention to what the question is asking for!

5. Box Plot

This is a highlight for Grade 12! A box plot helps us visualize the "shape" of the data using the Five-Number Summary:
1. Minimum (Min)
2. First Quartile (\( Q_1 \))
3. Median (\( Q_2 \))
4. Third Quartile (\( Q_3 \))
5. Maximum (Max)

How to read a Box Plot:
- If the box is wide, the data in that range is highly spread out.
- If the box is narrow, the data in that range is dense or closely clustered.
- If you see points outside the whiskers, those are Outliers, which can be identified using the formula \( Q_3 + 1.5(IQR) \) or \( Q_1 - 1.5(IQR) \)
(Where \( IQR = Q_3 - Q_1 \))

Quick Summary: A box plot allows us to immediately see if the data is left-skewed, right-skewed, or symmetric.

Common Mistakes

1. Forgetting to sort the data: Before finding the median or percentile, students often rush into calculations and forget to order the numbers from smallest to largest.
2. Confusing position with the data value: The formula \( \frac{r}{100}(n+1) \) only tells you "which position the answer is in," not the answer itself. You still need to count back through the sorted data.
3. Using the Mean with outliers: If a question asks which central tendency is most appropriate and there is a extreme outlier, never choose the Mean!

Final Thoughts

Statistics isn't just about memorizing formulas; it's about trying to "understand the nature of the data."
- Central tendency represents the group.
- Position tells you where you stand.
- Dispersion tells you about consistency.
- Box plots help you see the big picture.

If you practice regularly, you’ll start to see the story these numbers are telling. It is definitely not beyond your ability. I’m cheering for you! Keep it up!