Hash tables are a powerful data structure used for fast data storage and retrieval, with an average time complexity of ( O(1) ) for insertions, deletions, and lookups. They rely on hash functions to map keys to specific indices within an array. However, hash tables can encounter collisions—situations where multiple keys map to the same index. This requires careful handling with techniques such as chaining and open addressing.
Let’s cover the essentials of hash tables, including the components of hash functions and collision resolution strategies.
Hash Functions
A hash function takes a key (e.g., a string or number) and returns an integer that represents the index in the hash table array where the value should be stored.
- Characteristics of a Good Hash Function:
- Deterministic: It should consistently map the same input to the same output.
- Efficient: Should compute quickly, ideally in constant time ( O(1) ).
- Uniform Distribution: Should distribute keys evenly to avoid clustering.
- Minimizes Collisions: Different keys should ideally map to different indices.
Example Hash Function for Strings
Here’s an example of a basic hash function for strings in Python. It uses the ASCII values of characters with modular arithmetic to fit within a fixed array size.
def simple_hash(key: str, array_size: int) -> int:
"""Simple hash function for a string key."""
hash_value = 0
for char in key:
hash_value += ord(char)
return hash_value % array_size
# Example usage
array_size = 10
index = simple_hash("example", array_size)
print(f"Index for 'example': {index}")
This function takes the sum of the ASCII values of characters in the key and takes the modulus by the array size to keep the result within bounds. While simple, this approach may cause clustering for similar strings.
Collision Resolution
Collisions occur when two or more keys hash to the same index. Several techniques help handle collisions, with the two main ones being chaining and open addressing.
Chaining
Chaining resolves collisions by storing multiple items in the same index as a linked list or another data structure.
- Pros: Easy to implement; handles a large number of collisions well.
- Cons: Requires additional space for links and can degrade to (O(n)) time complexity if many elements are at the same index.
Implementation Example of Hash Table with Chaining
Here’s a Python example of a hash table that uses chaining to resolve collisions:
class HashTableChaining:
def __init__(self, size: int):
self.size = size
self.table = [[] for _ in range(size)]
def hash_function(self, key: str) -> int:
"""Generate an index for a given key."""
return sum(ord(char) for char in key) % self.size
def insert(self, key: str, value: int):
"""Insert key-value pair into hash table."""
index = self.hash_function(key)
for pair in self.table[index]:
if pair[0] == key:
pair[1] = value # Update if key exists
return
self.table[index].append([key, value]) # Add new key-value pair
def get(self, key: str) -> int:
"""Retrieve value associated with key."""
index = self.hash_function(key)
for pair in self.table[index]:
if pair[0] == key:
return pair[1]
raise KeyError("Key not found")
# Usage
hash_table = HashTableChaining(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
print("Value for 'apple':", hash_table.get("apple")) # Output: 1
Open Addressing
Open addressing resolves collisions by finding another open slot in the array when a collision occurs. Common methods for finding open slots include linear probing, quadratic probing, and double hashing.
Linear Probing
Linear probing checks the next slot in the array sequentially until an open slot is found.
- Pros: Simple to implement.
- Cons: Can lead to clustering, where consecutive slots are filled, increasing lookup time.
Implementation Example Using Linear Probing
class HashTableLinearProbing:
def __init__(self, size: int):
self.size = size
self.table = [None] * size
def hash_function(self, key: str) -> int:
"""Generate an index for a given key."""
return sum(ord(char) for char in key) % self.size
def insert(self, key: str, value: int):
"""Insert key-value pair with linear probing."""
index = self.hash_function(key)
while self.table[index] is not None:
if self.table[index][0] == key:
self.table[index] = (key, value) # Update if key exists
return
index = (index + 1) % self.size
self.table[index] = (key, value)
def get(self, key: str) -> int:
"""Retrieve value associated with key using linear probing."""
index = self.hash_function(key)
start_index = index
while self.table[index] is not None:
if self.table[index][0] == key:
return self.table[index][1]
index = (index + 1) % self.size
if index == start_index:
break # Avoid infinite loop if the table is full
raise KeyError("Key not found")
# Usage
hash_table = HashTableLinearProbing(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
print("Value for 'banana':", hash_table.get("banana")) # Output: 2
Quadratic Probing
Quadratic probing moves by progressively larger steps to avoid clustering.
- Index Calculation: Instead of moving to the next slot (linear probing), quadratic probing moves by squares: ( i + 1^2, i + 2^2, ).
- Pros and Cons: Less clustering than linear probing, but more complex to implement and analyze.