Welcome to Data Representation!
Ever wondered how your computer knows the difference between a selfie, a song, and a text message? At its heart, a computer is just a collection of tiny switches that can either be OFF or ON. We represent these states as 0 and 1. This is called Binary.
In this chapter, we will learn how computers "speak" using only these two numbers to represent everything from simple integers to complex characters. Don't worry if it seems like a lot of math at first—we'll break it down step-by-step!
1. Binary Magnitudes and Prefixes
In everyday life, we use decimal prefixes (Base 10). For example, 1 kilometer is 1,000 meters. However, computers work in binary (Base 2), so they use binary prefixes.
Decimal Prefixes (Base 10)
These are the ones you use in science class. They increase by powers of 10 (\(10^3\)).
- kilo (k): \(10^3 = 1,000\)
- mega (M): \(10^6 = 1,000,000\)
- giga (G): \(10^9 = 1,000,000,000\)
- tera (T): \(10^{12} = 1,000,000,000,000\)
Binary Prefixes (Base 2)
These are more accurate for computers because they increase by powers of 2 (\(2^{10}\)).
- kibi (Ki): \(2^{10} = 1,024\)
- mebi (Mi): \(2^{20} = 1,048,576\)
- gibi (Gi): \(2^{30} = 1,073,741,824\)
- tebi (Ti): \(2^{40} = 1,099,511,627,776\)
Quick Review: If you buy a "1 Terabyte" hard drive, you are getting \(10^{12}\) bytes. But your computer might report it as slightly less because it measures in "Tebibytes" (\(2^{40}\)).
Key Takeaway: Decimal prefixes are based on 1,000; Binary prefixes are based on 1,024.
2. Number Systems
We need to be able to convert between three main systems: Denary (our normal numbers), Binary (computer numbers), and Hexadecimal (shorthand for binary).
Denary (Base 10)
Uses digits 0-9. The place values are units, tens, hundreds, etc.
Binary (Base 2)
Uses only 0 and 1. The place values are 128, 64, 32, 16, 8, 4, 2, 1.
Example: 1010 in binary is (1 × 8) + (0 × 4) + (1 × 2) + (0 × 1) = 10 in denary.
Hexadecimal (Base 16)
Uses digits 0-9 and letters A-F.
A = 10, B = 11, C = 12, D = 13, E = 14, F = 15.
Hex is used because it is much shorter than binary and easier for humans to read without making mistakes.
Real-World Use of Hex: You will see Hex used in HTML Color Codes (e.g., #FFFFFF is white) and MAC Addresses for network devices.
Binary Coded Decimal (BCD)
BCD is a special way of representing denary numbers. Each denary digit is converted into its own 4-bit binary nibble.
Example: To represent 12 in BCD:
1 = 0001
2 = 0010
So, 12 in BCD is 0001 0010.
Real-World Use of BCD: Used in digital clocks and calculators where individual digits need to be displayed on a screen.
Key Takeaway: Binary is for computers, Hex is for humans to read binary easily, and BCD is for displaying individual digits.
3. Signed Integers (Negative Numbers)
How do we tell a computer a number is negative? We use two main methods: One's Complement and Two's Complement.
One's Complement
To get the one's complement of a binary number, simply flip all the bits (change 0s to 1s and 1s to 0s).
Two's Complement (The Industry Standard)
This is the most common way to represent negative numbers.
Step 1: Start with the positive binary version of the number.
Step 2: Flip all the bits (One's Complement).
Step 3: Add 1 to the result.
Memory Aid: "Flip and Add One!"
Did you know? In Two's Complement, the furthest bit on the left (the Most Significant Bit) has a negative value. In an 8-bit number, the place values are: -128, 64, 32, 16, 8, 4, 2, 1.
Key Takeaway: Two's Complement is used because it allows the computer to perform subtraction using the same hardware it uses for addition.
4. Binary Arithmetic and Overflow
Adding binary is just like adding denary, but you "carry over" much sooner!
Rules for Binary Addition:
- \(0 + 0 = 0\)
- \(0 + 1 = 1\)
- \(1 + 1 = 0\) (carry 1)
- \(1 + 1 + 1 = 1\) (carry 1)
What is Overflow?
Imagine you have an 8-bit register (a "storage box" that holds 8 bits). If you add two numbers and the result needs 9 bits, the extra bit has nowhere to go. This is called Overflow.
Analogy: It’s like a car’s odometer reaching 999,999 miles and then flipping back to 000,000. The computer might give a completely wrong answer because it "lost" the overflow bit.
Key Takeaway: Overflow occurs when the result of a calculation is too large to fit in the allocated number of bits.
5. Representing Characters
Computers don't know what the letter "A" is. They only know numbers. A Character Set is a look-up table that tells the computer which number represents which character.
ASCII (American Standard Code for Information Interchange)
- Standard ASCII: Uses 7 bits, allowing for 128 characters (English letters, numbers, and basic symbols).
- Extended ASCII: Uses 8 bits, allowing for 256 characters (includes special symbols and some accented characters).
Unicode
ASCII was great, but it couldn't represent languages like Chinese, Arabic, or even Emojis! Unicode was created to solve this.
- It uses a much larger number of bits (usually 16 or 32).
- It can represent every language in the world.
- The first 128 codes of Unicode are exactly the same as ASCII to stay compatible.
Common Mistake: Students often think ASCII and Unicode are the "files" themselves. They aren't! They are just the coding systems used to translate bits into letters.
Key Takeaway: ASCII is small and limited to English; Unicode is huge and includes every character and emoji imaginable.
Final Quick Review Box
1. Binary Prefixes: kibi, mebi, gibi, tebi (Multiples of 1,024).
2. Hexadecimal: Base 16 (0-9, A-F). Used for colors and MAC addresses.
3. BCD: Each digit gets 4 bits (e.g., 5 = 0101). Used for digital displays.
4. Two's Complement: Flip the bits and add 1. Used for negative numbers.
5. Overflow: When a result is too big for the CPU register.
6. Unicode: The global standard for characters, replacing the limited ASCII.
You've reached the end of the Data Representation notes! Don't worry if the conversions take a few tries to master—practice makes perfect. You're doing great!