Character encoding

Computer Science 8525 (Current) · AQA GCSE · 4 min read

Welcome to Character Encoding!

Ever wondered how a computer, which only understands binary (0s and 1s), can display a text message, an email, or even a pizza emoji? It all comes down to a clever system called character encoding.

In this chapter, we are going to learn how computers turn numbers into letters and why we need different systems to make sure everyone around the world can communicate. Don't worry if binary seems a bit "maths-heavy"—this part is all about the secret codes computers use!

1. What is a Character Set?

A character set is basically a "translation book" or a giant list. It contains all the characters (letters, numbers, and symbols) that a computer can recognize, and it assigns a unique binary number to every single one.

Think of it like a menu in a restaurant where every dish has a number. If you tell the waiter you want "Number 42," they know exactly which meal you mean. In a computer, if the "menu" says 65 is the letter 'A', every time the computer sees 65, it shows an 'A' on your screen.

Quick Review: The Basics

• Computers only process binary (0s and 1s).
• Every time you press a key on your keyboard, a binary signal is sent.
• The character set tells the computer which letter that signal represents.

Key Takeaway: A character set is a defined list of characters recognized by computer hardware and software, where each character is represented by a unique number.

2. ASCII: The Original Code

ASCII (pronounced 'as-kee') stands for American Standard Code for Information Interchange. It was one of the first major character sets created.

The standard version of ASCII uses 7 bits. Since \(2^7 = 128\), this means it can represent 128 different characters. These include:
• Uppercase letters (A-Z)
• Lowercase letters (a-z)
• Numbers (0-9)
• Punctuation (! , . ?)
• Special "control" characters (like the 'Enter' key or 'Space')

The Problem with ASCII:
While 128 characters are plenty for the English language, it isn't enough for the whole world! ASCII doesn't have room for mathematical symbols, accented letters (like 'é'), or characters from other languages like Greek, Arabic, or Chinese. And it definitely doesn't have room for emojis!

Did you know? Even though ASCII uses 7 bits for the code, computers usually store it in a full 8-bit byte, leaving the 8th bit as a 0.

Key Takeaway: 7-bit ASCII is a simple character set that can store 128 characters. It is great for English but too small for global use.

3. Unicode: The Global Solution

To solve the "not enough room" problem of ASCII, Unicode was created. The goal of Unicode is to represent every character in every language in the world.

Advantages of Unicode over ASCII:

1. Global Range: It can represent alphabets from all over the world (Mandarin, Cyrillic, Hebrew, etc.).
2. Symbols & Emojis: It includes thousands of scientific symbols and all your favorite emojis.
3. Compatibility: Unicode was designed to be "backwards compatible" with ASCII.

Important Point: Unicode uses the same codes as ASCII for the first 127 characters. This means that if the letter 'A' is 65 in ASCII, it is also 65 in Unicode! This makes it easy for old systems to work with new ones.

Key Takeaway: Unicode is a much larger character set that can represent thousands of characters, making it suitable for global communication and modern symbols.

4. Working with Character Tables

In your exam, you might be given a table and asked to convert between characters and their codes. There is a very important "trick" you need to know: Character codes run in sequence.

If you know the code for 'A', you can figure out the code for 'D' just by counting forward!

Example Step-by-Step:

Imagine the exam tells you that 'A' = 65. What is the code for 'D'?

1. A = 65
2. B = 66
3. C = 67
4. D = 68

This works for lowercase letters and numbers too! If 'a' is 97, then 'b' is 98. If '0' is 48, then '1' is 49.

Common Mistake to Avoid:

Don't confuse the number with its character code. The character '5' is not stored as the binary for 5. In ASCII, the character '5' actually has the code 53! Always check your provided table.

Key Takeaway: Characters are grouped and ordered logically in encoding tables. You can calculate the code of a nearby character by following the alphabetical or numerical sequence.

5. Summary Quick Review

Character Set: A list of characters and their unique binary codes.
7-bit ASCII: Stores 128 characters. Limited to English/Latin characters.
Unicode: A massive character set for all languages and emojis. The first 127 codes match ASCII exactly.
Ordering: Codes run in alphabetical/numerical order (A, B, C... or 1, 2, 3...).

Encouraging Note: Don't worry about memorizing the actual numbers (like A=65). If you need them in the exam, the question will provide the table or the starting point. You just need to know how the system works!

Quick check

Can you answer these now?

Open each question to check the key ideas from this chapter.

What is a character set?

A character set is a defined list of characters recognized by a computer, where each character is assigned a unique binary number.

How does a computer handle keyboard inputs using a character set?

When a key is pressed, a binary signal is sent to the computer, and the character set is used to identify which specific character that signal represents.

What is 7-bit ASCII and how many characters can it represent?

ASCII is a character set that uses 7 bits for each code, allowing it to represent a total of 128 different characters (\(2^7 = 128\)).

What is the main limitation of the ASCII character set?

ASCII only supports 128 characters, which is insufficient for representing non-English languages, mathematical symbols, or emojis.

How is an ASCII character typically stored in computer memory?

Although the code only requires 7 bits, it is usually stored in a full 8-bit byte, with the eighth bit typically set to 0.

What is Unicode and why was it developed?

Unicode is a character set designed to represent every character and symbol in every language in the world, overcoming the size limitations of ASCII.

How does Unicode maintain compatibility with ASCII?

Unicode is backwards compatible because the first 127 character codes in Unicode are identical to the 7-bit ASCII codes.

How can you determine the code for a character if you know the code for a preceding character in the set?

Because character codes run in logical sequences, you can count forward or backward in alphabetical or numerical order from a known code.

Ready to test yourself?

Turn these notes into exam-style practice. Get unlimited AI questions on this topic with instant marking and explanations.

Practice This Topic

More Computer Science 8525 (Current) chapters

* The content provided by thinka is generated by AI and may not always be accurate or up-to-date. Please use it as a supplementary resource and verify with official materials.

Put These Notes into Practice

Reading the notes is a great start. Now practise with unlimited AI-generated questions and get instant feedback. 100,000+ students are already improving their grades.

Start Practising Now View Pricing

Done reading? Test yourself with AI practice questions

Practice This Topic Now