Welcome to the World of Hash Tables!
In our previous sections, we looked at searching through lists and arrays. Remember how the Binary Search was fast, but required the data to be sorted? And Linear Search? Well, that takes ages if the list is long!
What if there was a way to find exactly what you are looking for instantly, without searching through every item? That is the magic of Hash Tables. In this chapter, we will learn how they work, how they handle "traffic jams" (collisions), and why they are one of the most powerful tools in a programmer's toolkit. Don't worry if it sounds complex at first—we'll break it down step-by-step!
1. What is a Hash Table?
A Hash Table is a data structure that stores data in an associative way. This means data is stored in a way that allows for very fast retrieval. Think of it like a giant wall of lockers at a gym.
The Analogy:
Imagine you have 100 lockers. Instead of searching every locker to find your bag (Linear Search), you are given a locker number based on your name. Whenever you return, you just look at your name, calculate your number, and go straight to that locker. No searching required!
The Key Components:
1. The Key: This is the unique input you provide (like a Username or a Product ID).
2. The Hash Function: A special formula that turns your Key into a specific number (index).
3. The Hash Table (Array): The actual list or "slots" where the data is kept.
Quick Review: The main goal of a Hash Table is to provide constant time access to data, meaning it takes the same amount of time to find an item whether you have 10 items or 10,000 items!
2. How Hashing Works (The Hash Function)
The "secret sauce" of a Hash Table is the Hash Function. This is a mathematical algorithm that takes a Key and maps it to a specific index in the array.
A very common method used in exams is the Modulo Method. The formula looks like this:
\( \text{index} = \text{key} \mod \text{table\_size} \)
Example:
Suppose our Hash Table has 10 slots (size 10). We want to store the ID 147.
\( 147 \mod 10 = 7 \)
The ID 147 will be stored at index 7.
What makes a "Good" Hash Function?
To keep things efficient, a good hash function should:
• Be very fast to compute.
• Distribute keys evenly across the table (so we don't end up with everyone trying to use the same locker!).
• Minimize collisions (which we will talk about next).
Key Takeaway: The Hash Function tells the computer exactly where to put the data so it can find it later without looking anywhere else.
3. Handling Collisions (The "Traffic Jams")
Sometimes, two different keys will result in the same index. This is called a Collision.
Example:
If our table size is 10:
ID 147 \( \mod 10 = 7 \)
ID 257 \( \mod 10 = 7 \)
Uh oh! Both IDs want to sit in index 7. We can't just delete the old one, so we need a strategy to handle this.
The Oxford AQA syllabus requires you to know two main ways to fix this:
Method A: Linear Probing (Open Addressing)
This is the "look for the next empty seat" method. If a collision occurs at index 7, the computer checks index 8. If that's full, it checks index 9, and so on, until it finds an empty slot.
Pros: Very simple to implement.
Cons: Can lead to "clustering," where long chains of data bunch up together, slowing things down.
Method B: Chaining (Closed Addressing)
In this method, each slot in the Hash Table isn't just a single box; it's a pointer to a Linked List. If a collision occurs, the new item is simply added to the end of the list at that index.
Pros: The table never actually "fills up" in the same way; you just keep adding to the chains.
Cons: If the chains get too long, it starts to feel like a Linear Search again, which is slow!
Did you know? Most modern software uses a mix of these techniques to keep things running at lightning speed!
4. Common Pitfalls and Tips
Mistake to Avoid: Don't forget that when you search for an item in a table using Linear Probing, you must follow the same "next slot" logic. If you don't find the item at the calculated index, you have to keep looking at the following slots until you find it or hit an empty space!
Memory Aid:
• Linear Probing = Look for the next Location.
• Chaining = Connect a Chain (Linked List).
5. Chapter Summary Checklist
Check your understanding before moving on:
• Can I explain what a Hash Function does? (It turns a key into an index)
• Do I know the formula for the Modulo Method? ( \( \text{key} \mod \text{size} \) )
• Can I define a Collision? (When two keys produce the same index)
• Can I describe Linear Probing and Chaining? (Next available slot vs. Linked List)
Don't worry if this seems tricky at first! Try drawing a small array of 5 slots on a piece of paper and try "hashing" the numbers 5, 11, and 15 into it using \( \mod 5 \). Seeing the collisions happen in your own drawing is the best way to learn!