Welcome to Data Presentation!

Ever wondered how companies decide which flavors of crisps to stock more of, or how doctors track the growth of babies? It all starts with data presentation. In this chapter, we aren’t just looking at piles of numbers; we are learning how to turn those numbers into a story that anyone can understand.

Don’t worry if you’ve found Statistics a bit "dry" before—we’re going to break this down into simple, visual steps that make the patterns jump off the page!


1. The Building Blocks: Types of Data

Before we can draw anything, we need to know what kind of "bricks" we are building with. In Mathematics B (MEI), we categorize data into four main types:

  • Categorical Data: These are non-numerical labels. Example: Eye color, car brands, or your favorite pizza topping.
  • Discrete Data: Numerical data that can only take specific values (usually whole numbers). You can count these. Example: The number of students in a class or the number of goals scored in a match.
  • Continuous Data: Numerical data that can take any value within a range. You measure these. Example: Your height, the weight of an apple, or the time it takes to run 100m.
  • Ranked Data: Data that has been put in an order or given a position. Example: 1st, 2nd, and 3rd place in a race.

Quick Review: If you can count it, it’s usually discrete. If you have to measure it with a tool (like a ruler or a stopwatch), it’s continuous.

Summary: Identifying your data type is the first step in choosing the right graph. You wouldn't use a histogram for your favorite colors!


2. Visualizing Ungrouped Data

When our data isn't in big ranges (classes), we use several standard diagrams. Here are the ones you need to recognize:

Bar Charts and Vertical Line Charts

These are great for categorical or discrete data. In a vertical line chart, the height of the line represents the frequency. Analogy: Think of a bar chart like a row of buildings; the taller the building, the more people (data points) live there!

Dot Plots

A dot plot is similar to a bar chart but uses a stack of dots to represent frequency. It’s very useful for seeing the "shape" of small datasets quickly.

Pie Charts

Used to show proportions of a whole. Did you know? To find the angle for each slice, use the formula: \( \text{Angle} = \frac{\text{Frequency}}{\text{Total Frequency}} \times 360^\circ \).

Stem-and-Leaf Diagrams

This is a clever way to show every single piece of data while still seeing the overall shape. Memory Aid: Think of a plant. The "stem" is the main part (e.g., the "tens" digit), and the "leaves" are the bits that grow off it (the "units" digits). Always remember to include a key (e.g., 2 | 1 means 21)!

Summary: These diagrams keep the original data visible but organize it so we can see which values are most common.


3. Histograms: The Big Picture

Histograms are used for grouped continuous data. They look like bar charts, but there is one massive difference: The area of the bar represents the frequency, not the height!

This is the part that trips most students up, but here is the secret formula:

\[ \text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}} \]

Step-by-Step: How to handle Histograms
1. Find the Class Width (the difference between the upper and lower boundaries of the group).
2. Calculate the Frequency Density for each group using the formula above.
3. Draw your axes: The x-axis is your data (e.g., Weight), and the y-axis must be labeled Frequency Density.
4. Draw your bars. Because the data is continuous, there should be no gaps between the bars!

Common Mistake to Avoid: Never just plot frequency on the y-axis for a histogram if the class widths are different. If you do, the "wide" bars will look much more important than they actually are!

Summary: In a histogram, Area = Frequency. If one bar is twice as wide as another but represents the same number of people, it must be half as tall.


4. Cumulative Frequency and Box Plots

Sometimes we want to see the "running total" of our data. This is cumulative frequency.

Cumulative Frequency Diagrams

To draw this, you add up the frequencies as you go along. Always plot the points at the upper bound of the class interval and join them with a smooth S-shaped curve (called an ogive).

Box-and-Whisker Diagrams (Box Plots)

These are perfect for seeing the spread of your data. A box plot shows five key values:

  • Minimum: The lowest value.
  • Lower Quartile (LQ): The 25% mark.
  • Median: The middle value (50% mark).
  • Upper Quartile (UQ): The 75% mark.
  • Maximum: The highest value.
The "box" represents the middle 50% of the data, and the width of the box is the Interquartile Range (IQR), calculated as \( \text{UQ} - \text{LQ} \).

Summary: Cumulative frequency helps us find the median and quartiles, which we then use to draw a box plot to compare different sets of data easily.


5. Describing the Shape (Distributions)

Once your diagram is drawn, you need to describe it using specific mathematical words:

  • Unimodal: The data has one clear "peak" (one mode).
  • Bimodal: The data has two clear peaks. Analogy: Like the two humps on a camel!
  • Symmetrical: The left side is a mirror image of the right side.
  • Skewed: The data is "leaning" to one side.
How to remember Skewness:

Think of where the "tail" of the graph is pointing:
- Positive Skew: The "tail" points to the right (towards the positive numbers). Most data is bunched at the low end.
- Negative Skew: The "tail" points to the left (towards the negative/lower numbers). Most data is bunched at the high end.

Don't worry if this seems tricky! Just look at the "tail." If the graph trails off towards the right, it's positive skew. If it trails off towards the left, it's negative skew.

Summary: Describing the distribution helps us understand if the data is balanced or if it is "weighted" heavily towards one end.


6. Outliers and Cleaning Data

Sometimes, data is just plain weird. An outlier is a data point that is inconsistent with the rest of the set. It might be a measurement error or just a very rare event.

How to identify an outlier (The 1.5 x IQR Rule):
A value is usually considered an outlier if it is:
- More than \( 1.5 \times \text{IQR} \) above the Upper Quartile.
- More than \( 1.5 \times \text{IQR} \) below the Lower Quartile.

Alternatively, you might be told an outlier is more than 2 standard deviations from the mean.

Cleaning Data: This is the process of dealing with missing values, errors, or deciding whether to keep or remove outliers before you start your final analysis.

Takeaway: Always look for values that don't fit the pattern. They might be the most interesting part of your data, or they might be a mistake that needs "cleaning"!