Welcome to Character Encoding!

Ever wondered how a computer, which only understands binary (0s and 1s), can display a text message, an email, or even a pizza emoji? It all comes down to a clever system called character encoding.

In this chapter, we are going to learn how computers turn numbers into letters and why we need different systems to make sure everyone around the world can communicate. Don't worry if binary seems a bit "maths-heavy"—this part is all about the secret codes computers use!

1. What is a Character Set?

A character set is basically a "translation book" or a giant list. It contains all the characters (letters, numbers, and symbols) that a computer can recognize, and it assigns a unique binary number to every single one.

Think of it like a menu in a restaurant where every dish has a number. If you tell the waiter you want "Number 42," they know exactly which meal you mean. In a computer, if the "menu" says 65 is the letter 'A', every time the computer sees 65, it shows an 'A' on your screen.

Quick Review: The Basics

• Computers only process binary (0s and 1s).
• Every time you press a key on your keyboard, a binary signal is sent.
• The character set tells the computer which letter that signal represents.

Key Takeaway: A character set is a defined list of characters recognized by computer hardware and software, where each character is represented by a unique number.

2. ASCII: The Original Code

ASCII (pronounced 'as-kee') stands for American Standard Code for Information Interchange. It was one of the first major character sets created.

The standard version of ASCII uses 7 bits. Since \(2^7 = 128\), this means it can represent 128 different characters. These include:
• Uppercase letters (A-Z)
• Lowercase letters (a-z)
• Numbers (0-9)
• Punctuation (! , . ?)
• Special "control" characters (like the 'Enter' key or 'Space')

The Problem with ASCII:
While 128 characters are plenty for the English language, it isn't enough for the whole world! ASCII doesn't have room for mathematical symbols, accented letters (like 'é'), or characters from other languages like Greek, Arabic, or Chinese. And it definitely doesn't have room for emojis!

Did you know? Even though ASCII uses 7 bits for the code, computers usually store it in a full 8-bit byte, leaving the 8th bit as a 0.

Key Takeaway: 7-bit ASCII is a simple character set that can store 128 characters. It is great for English but too small for global use.

3. Unicode: The Global Solution

To solve the "not enough room" problem of ASCII, Unicode was created. The goal of Unicode is to represent every character in every language in the world.

Advantages of Unicode over ASCII:

1. Global Range: It can represent alphabets from all over the world (Mandarin, Cyrillic, Hebrew, etc.).
2. Symbols & Emojis: It includes thousands of scientific symbols and all your favorite emojis.
3. Compatibility: Unicode was designed to be "backwards compatible" with ASCII.

Important Point: Unicode uses the same codes as ASCII for the first 127 characters. This means that if the letter 'A' is 65 in ASCII, it is also 65 in Unicode! This makes it easy for old systems to work with new ones.

Key Takeaway: Unicode is a much larger character set that can represent thousands of characters, making it suitable for global communication and modern symbols.

4. Working with Character Tables

In your exam, you might be given a table and asked to convert between characters and their codes. There is a very important "trick" you need to know: Character codes run in sequence.

If you know the code for 'A', you can figure out the code for 'D' just by counting forward!

Example Step-by-Step:

Imagine the exam tells you that 'A' = 65. What is the code for 'D'?

1. A = 65
2. B = 66
3. C = 67
4. D = 68

This works for lowercase letters and numbers too! If 'a' is 97, then 'b' is 98. If '0' is 48, then '1' is 49.

Common Mistake to Avoid:

Don't confuse the number with its character code. The character '5' is not stored as the binary for 5. In ASCII, the character '5' actually has the code 53! Always check your provided table.

Key Takeaway: Characters are grouped and ordered logically in encoding tables. You can calculate the code of a nearby character by following the alphabetical or numerical sequence.

5. Summary Quick Review

Character Set: A list of characters and their unique binary codes.
7-bit ASCII: Stores 128 characters. Limited to English/Latin characters.
Unicode: A massive character set for all languages and emojis. The first 127 codes match ASCII exactly.
Ordering: Codes run in alphabetical/numerical order (A, B, C... or 1, 2, 3...).

Encouraging Note: Don't worry about memorizing the actual numbers (like A=65). If you need them in the exam, the question will provide the table or the starting point. You just need to know how the system works!