[Grade 1] Data Analysis and Utilization — Become a Master of Interpreting Data!

Hello! Today, we’re going to learn about the unit "Data Analysis and Utilization" together.
The word "data" might sound a bit intimidating, but it’s actually all around us. For example, your class test scores, sports statistics, and the number of views on your favorite videos are all examples of "data."

In this chapter, we will learn how to organize scattered numbers to reveal the truth behind them—"What is this data actually telling us?" It might seem like a lot of terminology at first, but if we take it one step at a time, you'll be just fine!

1. Organizing Data (Frequency Distribution Tables and Histograms)

It’s hard to see the big picture when you just have a long list of numbers, right? That’s why we start by grouping and organizing the data.

① Frequency Distribution Table

This is a table that divides data into several intervals and summarizes how many data points fall into each interval.

[Key Terms to Remember]
Class: The individual intervals used to group the data. (e.g., 10 points or more to less than 20 points)
Class Width: The size of each interval. (e.g., if the range is from 10 to 20, the width is 10)
Frequency: The number of data points (or people) that fall into a specific class.
Class Value (Midpoint): The exact middle value of a class. You can calculate it using: \( (\text{Start of class} + \text{End of class}) \div 2 \).

② Histograms and Frequency Polygons

Turning a table into a graph makes it much easier to understand!

Histogram: A graph that uses rectangles, with classes on the horizontal axis and frequencies on the vertical axis. It looks like a bar graph, but with the distinct feature that there are no gaps between the bars.
Frequency Polygon: A graph created by connecting the midpoints of the top sides of the histogram rectangles with a line. This makes it easier to see how the data changes.

[Pro-Tip!]
Looking at a histogram lets you see at a glance where most people are concentrated!

[Fun Fact]
The name "histogram" is said to come from the Greek word for "upright." It’s like the bars are standing up to compare their heights!

★Summary So Far★
Take scattered data, organize it into a frequency distribution table, and then visualize it using a histogram!

2. Relative Frequency

"Class A had 5 people score 80 or above, and Class B had 10. Which class had a higher percentage of students scoring 80 or above?"
Actually, you can't compare this using the number of people alone because the total number of students in each class might be different.

That’s where relative frequency comes in!

Formula for Relative Frequency

\( \text{Relative Frequency} = \frac{\text{Frequency of that class}}{\text{Total frequency}} \)

・Relative frequency represents the proportion as a fraction or decimal, treating the whole as 1.
If you add up the relative frequencies of all classes, it will always equal 1. (You can use this to check your calculations!)

[Analogy]
Think about it: 1 person out of 10 having a snack is very different from 1 person out of 100 having one. Relative frequency is the tool we use to compare these "ratios."

★Summary So Far★
When comparing groups with different total numbers of data points, always use relative frequency!

3. Measures of Central Tendency

Numbers that express the overall characteristics of a data set in a single value are called "measures of central tendency." There are three main "celebrities" in this category:

① Mean (Average)

The "average" you know well.
\( \text{Mean} = \frac{\text{Sum of data values}}{\text{Number of data points}} \)

② Median

When you line up the data in order (either smallest to largest or vice-versa), this is the value that falls exactly in the middle.
*Note! If the number of data points is even, take the mean of the two middle values.

③ Mode

The value that appears most frequently in the data (the most popular value).
In a frequency distribution table, you answer with the class value of the class with the highest frequency.

[Common Mistake!]
Many people accidentally answer with the frequency (the number of people) instead of the mode! Remember, you need to answer with the "class value (points, time, etc.)"!

[Fun Fact: Tips for choosing the right one]
・The mean is the most convenient for everyday use!
・If there are extreme values (too large or too small), the mean gets skewed, so use the median!
・If you want to know what's "most popular" in a survey, use the mode!

★Summary So Far★
To truly understand your data, decide whether the mean, median, or mode is the best tool for the job!

4. Spread of Data and Range

Even if two classes have the same average score, the situation is different if one class has "everyone near the average" versus another class that ranges from "0 to 100." The indicator we use to see this "spread" is the range.

How to Calculate the Range

\( \text{Range} = \text{Maximum value} - \text{Minimum value} \)

The larger the range, the more spread out the data is.

[Pro-Tip!]
The range only looks at the difference between the "largest" and "smallest" values, so the calculation is super simple!

5. Conclusion: Tips for Reading Graphs

The most important part of data analysis isn't just the calculation. It’s thinking, "What can I infer from this data?"

・Is the "peak" leaning toward the left or the right?
・Are there two peaks? (This might be a sign that two different groups are mixed together!)
・Compared to last year, has the overall value gone up?

It might feel difficult at first, but once you start paying attention to the weather reports on the news or sports statistics and ask yourself, "Which measure of central tendency is this?", you'll find yourself getting better at it in no time!

I'm rooting for you. You've got this!