Welcome to Data Storage and Compression!

In this chapter, we are going to explore how we measure the "digital weight" of our files and how we make those files smaller so they don't take up too much room. Whether you are downloading a 50GB game or sending a quick 2MB photo to a friend, you are dealing with data storage and compression.

Don't worry if the numbers look big at first—once you see the pattern, it’s as simple as moving a decimal point (well, a binary one!).

Section 1: Measuring Data

Just like we measure weight in grams and kilograms, we measure digital data in bits and bytes. Because computers use binary (1s and 0s), these measurements are based on multiples of 2.

The Building Blocks

  • Bit: The smallest unit of data. A single 1 or 0.
  • Nibble: 4 bits (half a byte).
  • Byte: 8 bits. This is the standard unit used to measure a single character of text.

The Binary Multiples

In the past, people used "kilobytes" to mean 1000 or 1024. To be precise, your Edexcel course uses the IEC (International Electrotechnical Commission) standards. These units increase by a factor of 1024.

Memory Aid: "Kitchens Make Great Toast"
  1. Kibibyte (KiB): \(1024\) bytes
  2. Mebibyte (MiB): \(1024\) kibibytes
  3. Gibibyte (GiB): \(1024\) mebibytes
  4. Tebibyte (TiB): \(1024\) gibibytes

Did you know? We use "kibi" and "mebi" instead of "kilo" and "mega" to show we are working with 1024 (\(2^{10}\)) instead of 1000 (\(10^{3}\)).

Quick Review Box:
8 bits = 1 byte
1024 bytes = 1 KiB
1024 KiB = 1 MiB
1024 MiB = 1 GiB
1024 GiB = 1 TiB

Section 2: Calculating File Sizes

You need to be able to "construct expressions" (write the math) to calculate how big a file is or if it will fit on a disk. You usually don't need to solve the final big number in the exam, but you must show how you got there!

Calculating Image File Size

To find the size of a bitmap image, use this formula:
\(Resolution (width \times height) \times Colour Depth\)

Example: An image is 100 pixels wide, 100 pixels high, and has a 24-bit colour depth.
Expression: \(100 \times 100 \times 24\) bits.
To get the answer in bytes, divide by 8: \(\frac{100 \times 100 \times 24}{8}\)

Calculating Sound File Size

To find the size of a sound file, use this formula:
\(Sample Rate (Hz) \times Sample Interval (seconds) \times Bit Depth\)

Example: A 10-second recording with a sample rate of 44,100Hz and a 16-bit depth.
Expression: \(44,100 \times 10 \times 16\) bits.

Capacity Requirements

If you need to know how many 2MiB files fit on a 10GiB USB stick:
1. Convert everything to the same unit (convert GiB to MiB).
2. \(10 \times 1024 = 10,240\) MiB.
3. Divide capacity by file size: \(10,240 / 2 = 5,120\) files.

Common Mistake: Forgetting to divide by 8! Exams often ask for the answer in bytes, but your initial calculation is usually in bits.

Key Takeaway: Always check your units! If the question asks for KiB, you'll need to divide your total bytes by 1024.

Section 3: Data Compression

Compression is the process of making a file smaller. Why do we do it?
1. To take up less storage space.
2. To make files transfer faster over the internet (streaming/downloading).

1. Lossy Compression

Lossy compression makes a file smaller by permanently removing some of the data. It looks for things the human eye or ear can't easily notice.

  • Examples: JPEG (images), MP3 (sound), MP4 (video).
  • Pros: Massive reduction in file size.
  • Cons: Some quality is lost; the file cannot be turned back into the original.

2. Lossless Compression

Lossless compression makes a file smaller without losing any information. It finds patterns in the data and records them more efficiently. When you "unzip" the file, it is identical to the original.

  • Examples: PNG (images), ZIP (files), FLAC (sound).
  • Pros: No loss of quality.
  • Cons: The file size isn't reduced as much as with lossy compression.

An Analogy for Compression

Imagine you are sending a text.
Lossless is like using "don't" instead of "do not". It's shorter, but the meaning is 100% perfectly preserved.
Lossy is like sending "c u ltr". You've removed letters (data), and while the friend understands it, it’s not the original formal sentence anymore.

Quick Review Box:
Lossy: Smaller files, lower quality, data is gone forever.
Lossless: Slightly smaller files, perfect quality, data is all there.

Summary: What have we learned?

- Computers measure data in binary multiples (base 1024).
- We use bits, bytes, KiB, MiB, GiB, and TiB to measure storage.
- File size expressions help us plan how much storage we need.
- Lossy compression shrinks files by deleting data (good for photos/music).
- Lossless compression shrinks files by reorganizing data (good for text/code).