A Trie, also known as a prefix tree, is a specialized tree-like data structure often used for storing strings in a way that enables efficient prefix-based operations. They’re widely used in applications like autocomplete, spell-checking, and dictionary implementations. Let’s explore tries and compressed tries, covering their theoretical aspects, code examples, common mistakes, gotchas, and interview tips.
Tries (Prefix Trees)
A Trie is a multi-way tree structure where: - Each node represents a character. - Paths through the tree form words or prefixes. - Every node has up to 26 (for English lowercase letters) or more children, depending on the character set.
Key Properties:
- Root Node: Represents an empty string and is the starting point for all strings in the trie.
- Edges: Represent individual characters.
- Nodes: Represent the state of the prefix formed from the root to that node.
- End Markers: Each node representing the end of a valid word is marked (typically a boolean flag).
Advantages:
- Efficient prefix searching, allowing quick retrieval of words with specific prefixes.
- Predictable O(L) complexity for insertions and lookups, where L is the length of the word.
Example: Basic Trie Implementation
Here’s a Python implementation of a basic trie with insertion and search functionality.
from typing import Dict
class TrieNode:
def __init__(self):
self.children: Dict[str, TrieNode] = {} # Dictionary to store children
self.is_end_of_word: bool = False # Marker to indicate end of a word
# Actual Implementation
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
"""Insert a word into the trie."""
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode() # Create new node if char not in children
node = node.children[char]
node.is_end_of_word = True # Mark the end of the word
def search(self, word: str) -> bool:
"""Return True if the word exists in the trie, False otherwise."""
node = self.root
for char in word:
if char not in node.children:
return False # If character not found, word doesn't exist
node = node.children[char]
return node.is_end_of_word # Check if it's a complete word
def starts_with(self, prefix: str) -> bool:
"""Return True if any word in the trie starts with the given prefix."""
node = self.root
for char in prefix:
if char not in node.children:
return False # If character not found, no word with this prefix
node = node.children[char]
return True
Explanation of Code
TrieNode Class: Each node has a children dictionary for storing child nodes and a is_end_of_word flag.
- Insert Function: Inserts each character of a word and marks the end of the word.
- Search Function: Traverses down the trie to check if the word exists.
- Starts With Function: Checks if any word in the trie starts with a given prefix.
Compressed Tries (Radix Trees)
A compressed trie, also known as a radix tree or Patricia trie, is a space-optimized version of the regular trie. In a compressed trie, nodes that have a single child are combined, reducing unnecessary nodes. This optimization is helpful when storing large datasets with common prefixes.
Key Differences from Basic Tries:
- Compression: Consecutive nodes with only one child are compressed into a single node. Instead of storing each character individually, compressed tries store the entire substring.
- Space Efficiency: Compressed tries use less memory because they avoid storing nodes for characters that don’t form branches.
Example: Compressed Trie Insertion and Search
Implementing a full compressed trie from scratch can be complex. Here’s a simplified Python example for inserting and searching in a compressed trie, focusing on the main concept.
class CompressedTrieNode:
def __init__(self, value: str = ""):
self.value = value # Value of the substring represented by this node
self.children: Dict[str, CompressedTrieNode] = {}
self.is_end_of_word = False
class CompressedTrie:
def __init__(self):
self.root = CompressedTrieNode()
def insert(self, word: str) -> None:
"""Insert a word into the compressed trie."""
node = self.root
i = 0
while i < len(word):
found = False
for child_key, child_node in node.children.items():
# Find the longest common prefix between word[i:] and child_key
common_length = self._common_prefix_length(word[i:], child_key)
if common_length > 0:
# Split node if needed
if common_length < len(child_key):
new_child = CompressedTrieNode(child_key[common_length:])
new_child.children = child_node.children
new_child.is_end_of_word = child_node.is_end_of_word
# Update the child node with the common part only
child_node.value = child_key[:common_length]
child_node.children = {new_child.value: new_child}
child_node.is_end_of_word = False
# Move to the child node and adjust i
i += common_length
node = child_node
found = True
break
# If no common prefix, create a new node
if not found:
new_node = CompressedTrieNode(word[i:])
node.children[word[i:]] = new_node
new_node.is_end_of_word = True
return
node.is_end_of_word = True
def search(self, word: str) -> bool:
"""Search for a word in the compressed trie."""
node = self.root
i = 0
while i < len(word):
matched = False
for child_key, child_node in node.children.items():
if word[i:].startswith(child_key):
node = child_node
i += len(child_key)
matched = True
break
if not matched:
return False
return node.is_end_of_word
def _common_prefix_length(self, str1: str, str2: str) -> int:
"""Utility function to find the common prefix length between two strings."""
min_length = min(len(str1), len(str2))
for i in range(min_length):
if str1[i] != str2[i]:
return i
return min_length
Explanation of Code
- Compression: Nodes only store substrings with shared prefixes compressed into a single node. This reduces the number of nodes for words with common prefixes.
- Common Prefix Check: When inserting a word, we check for the longest common prefix with existing children, updating nodes and creating new ones as necessary.