【Math B】 Statistical Inference 〜The Magical Tool to Understand the Whole〜

Hello! Today, let's explore one of the biggest milestones in Math B: "Statistical Inference" together.
When you hear "statistics," many people might think, "The calculations look tedious..." or "It seems difficult..." But actually, this field is the math closest to our daily lives!
For example, television ratings, exit polls for elections, and checking the weight of snacks—these all use the statistical concept of "examining a part to estimate the whole."
It might take a little time to get used to the terminology at first, but if you take it one step at a time, you will definitely understand it. Let’s relax and get started!

1. Random Variables and Probability Distributions: The Basics First!

Before we start with statistical inference, we need to know the rules for handling data.

① What is a Random Variable \( X \)?

Like the numbers you get when rolling a die, a "variable whose probability is determined by the value it takes" is called a random variable. We usually represent it with an uppercase letter, \( X \).
For example, if \( X \) is the result of rolling a die once, the probability that \( X \) is 1 is \( \frac{1}{6} \). We write this as \( P(X=1) = \frac{1}{6} \).

② Expected Value (Mean), Variance, and Standard Deviation

These are the key indicators for the "center" and "spread" of data.

  • Expected Value \( E(X) \): This is the "mean" (average). You multiply each value by its probability and sum them all up.
    \( E(X) = x_1p_1 + x_2p_2 + \dots + x_np_n \)
  • Variance \( V(X) \): This represents how much the data is "spread out." It shows how far, on average, the values are from the mean.
    Useful calculation formula: \( V(X) = E(X^2) - \{E(X)\}^2 \) (Remember it as: "Mean of the squares" minus "square of the mean"!)
  • Standard Deviation \( \sigma(X) \): The square root of the variance. It is an indicator of spread that returns the units to the original scale.
    \( \sigma(X) = \sqrt{V(X)} \)

【Pro-Tip】
Keep this image in mind: "Large variance = data is scattered," and "Small variance = data is clustered around the mean!"

2. Binomial Distribution: A World of Success or Failure

When you repeat a trial that has only two possible outcomes—like "a shot goes in or misses" or "a coin lands on heads or tails"—exactly \( n \) times, the distribution is called a binomial distribution, written as \( B(n, p) \).

Formulas for Binomial Distribution \( B(n, p) \)

Let \( n \) be the number of trials and \( p \) be the probability of success. Calculations become surprisingly simple:
● Expected value: \( E(X) = np \)
● Variance: \( V(X) = np(1-p) \)
● Standard deviation: \( \sigma(X) = \sqrt{np(1-p)} \)

【Trivia】
It makes intuitive sense that the expected value is \( np \), right? For example, if you draw a lottery ticket with a 10% success rate (\( p=0.1 \)) 100 times (\( n=100 \)), you would expect to win an average of 10 times (\( 100 \times 0.1 \))!

3. Normal Distribution: The King of Statistics

Much of the data in the world (heights, test scores, etc.) forms a symmetric, bell-shaped curve centered around the mean. This is called the normal distribution \( N(m, \sigma^2) \).
※ \( m \) is the mean, and \( \sigma^2 \) is the variance.

Standardization (This shows up on tests a lot!)

Any normal distribution can be transformed into a "standard normal distribution \( N(0, 1) \)" with a mean of 0 and a variance of 1. This is called standardization.
Here is the transformation spell (formula):
\( Z = \frac{X - m}{\sigma} \)

【Steps: How to use standardization】
1. Identify the mean \( m \) and standard deviation \( \sigma \) from the problem.
2. Substitute them into the formula above to convert \( X \) to \( Z \).
3. Find the probability using the "Normal Distribution Table" at the back of your textbook.

【Common Mistake】
Many students mistakenly use the "variance \( \sigma^2 \)" in the denominator! Always remember to divide by the "standard deviation \( \sigma \)".

4. Population and Sample: Inferring the Whole from a Part

This is where "statistical inference" really begins!

  • Population: The entire group you want to study (e.g., all high school students in Japan).
  • Sample: The part you actually examine (e.g., a selected group of 100 high school students).
Properties of the Sample Mean \( \bar{X} \)

Let \( m \) be the population mean and \( \sigma \) be the population standard deviation.
When we extract \( n \) data points, the sample mean \( \bar{X} \) follows these properties:
● Expected value: \( E(\bar{X}) = m \) (The mean of the sample is the same as the original mean!)
● Standard deviation: \( \sigma(\bar{X}) = \frac{\sigma}{\sqrt{n}} \) (The more people you survey, the smaller the spread!)

【Analogy】
Imagine tasting a soup. The whole pot is the "population," and one ladle is your "sample." If you stir it well (random sampling), the taste of the ladle (sample mean) will tell you the taste of the entire pot (population mean). The larger the ladle (the larger \( n \) is), the more accurate your judgment of the taste will be.

5. Estimation: Predicting with Confidence

Using sample data to predict that an unknown population mean \( m \) is "likely within this range!" is called estimation.

Formula for 95% Confidence Interval

The most commonly used range is the one where we can say, "There is a 95% probability that the value is in here."
\( [ \bar{X} - 1.96 \frac{\sigma}{\sqrt{n}}, \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}} ] \)

The number 1.96 is a "special number that covers 95%" derived from the normal distribution table. Make sure to memorize this number before your test!

【Point】
A "95% confidence interval" means that "if you were to conduct the same survey 100 times, 95 of those times the true answer (the population mean) would be included within that range."

Final Thoughts: Advice for Studying

At first, you might feel overwhelmed by symbols like \( \bar{X} \) ("X-bar") and \( \sigma \) ("sigma").
But the goal remains the same throughout: "organize scattered data to smartly predict the whole."
It’s perfectly fine to keep the formulas in front of you while you study. Start by solving one example problem at a time, following the standardization steps.
Once you start getting the calculations right, it becomes as fun as solving a puzzle!

Summary of today's lesson:
1. Expected value is the "mean"; variance is the "spread."
2. Standardize a normal distribution using \( Z = \frac{X-m}{\sigma} \) to use the table.
3. Divide the standard deviation by \( \sqrt{n} \) when dealing with a sample mean.
4. The "1.96" appears for a 95% confidence interval.

Once you master "statistical inference," you'll start looking at numbers in the news in a whole new way. Keep it up! I’m rooting for you!