Tries

A Trie, also known as a prefix tree, is a specialized tree-like data structure often used for storing strings in a way that enables efficient prefix-based operations. They’re widely used in applications like autocomplete, spell-checking, and dictionary implementations. Let’s explore tries and compressed tries, covering their theoretical aspects, code examples, common mistakes, gotchas, and interview tips.

Tries (Prefix Trees)

A Trie is a multi-way tree structure where: - Each node represents a character. - Paths through the tree form words or prefixes. - Every node has up to 26 (for English lowercase letters) or more children, depending on the character set.

Key Properties:

  • Root Node: Represents an empty string and is the starting point for all strings in the trie.
  • Edges: Represent individual characters.
  • Nodes: Represent the state of the prefix formed from the root to that node.
  • End Markers: Each node representing the end of a valid word is marked (typically a boolean flag).

Advantages:

  • Efficient prefix searching, allowing quick retrieval of words with specific prefixes.
  • Predictable O(L) complexity for insertions and lookups, where L is the length of the word.

Example: Basic Trie Implementation

Here’s a Python implementation of a basic trie with insertion and search functionality.

from typing import Dict

class TrieNode:
    def __init__(self):
        self.children: Dict[str, TrieNode] = {}  # Dictionary to store children
        self.is_end_of_word: bool = False        # Marker to indicate end of a word

# Actual Implementation
class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word: str) -> None:
        """Insert a word into the trie."""
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()  # Create new node if char not in children
            node = node.children[char]
        node.is_end_of_word = True  # Mark the end of the word
    
    def search(self, word: str) -> bool:
        """Return True if the word exists in the trie, False otherwise."""
        node = self.root
        for char in word:
            if char not in node.children:
                return False  # If character not found, word doesn't exist
            node = node.children[char]
        return node.is_end_of_word  # Check if it's a complete word
    
    def starts_with(self, prefix: str) -> bool:
        """Return True if any word in the trie starts with the given prefix."""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False  # If character not found, no word with this prefix
            node = node.children[char]
        return True

Explanation of Code

TrieNode Class: Each node has a children dictionary for storing child nodes and a is_end_of_word flag.

  1. Insert Function: Inserts each character of a word and marks the end of the word.
  2. Search Function: Traverses down the trie to check if the word exists.
  3. Starts With Function: Checks if any word in the trie starts with a given prefix.

Compressed Tries (Radix Trees)

A compressed trie, also known as a radix tree or Patricia trie, is a space-optimized version of the regular trie. In a compressed trie, nodes that have a single child are combined, reducing unnecessary nodes. This optimization is helpful when storing large datasets with common prefixes.

Key Differences from Basic Tries:

  • Compression: Consecutive nodes with only one child are compressed into a single node. Instead of storing each character individually, compressed tries store the entire substring.
  • Space Efficiency: Compressed tries use less memory because they avoid storing nodes for characters that don’t form branches.

Implementing a full compressed trie from scratch can be complex. Here’s a simplified Python example for inserting and searching in a compressed trie, focusing on the main concept.

class CompressedTrieNode:
    def __init__(self, value: str = ""):
        self.value = value  # Value of the substring represented by this node
        self.children: Dict[str, CompressedTrieNode] = {}
        self.is_end_of_word = False

class CompressedTrie:
    def __init__(self):
        self.root = CompressedTrieNode()
    
    def insert(self, word: str) -> None:
        """Insert a word into the compressed trie."""
        node = self.root
        i = 0
        while i < len(word):
            found = False
            for child_key, child_node in node.children.items():
                # Find the longest common prefix between word[i:] and child_key
                common_length = self._common_prefix_length(word[i:], child_key)
                
                if common_length > 0:
                    # Split node if needed
                    if common_length < len(child_key):
                        new_child = CompressedTrieNode(child_key[common_length:])
                        new_child.children = child_node.children
                        new_child.is_end_of_word = child_node.is_end_of_word
                        
                        # Update the child node with the common part only
                        child_node.value = child_key[:common_length]
                        child_node.children = {new_child.value: new_child}
                        child_node.is_end_of_word = False
                    
                    # Move to the child node and adjust i
                    i += common_length
                    node = child_node
                    found = True
                    break
            
            # If no common prefix, create a new node
            if not found:
                new_node = CompressedTrieNode(word[i:])
                node.children[word[i:]] = new_node
                new_node.is_end_of_word = True
                return
        node.is_end_of_word = True
    
    def search(self, word: str) -> bool:
        """Search for a word in the compressed trie."""
        node = self.root
        i = 0
        while i < len(word):
            matched = False
            for child_key, child_node in node.children.items():
                if word[i:].startswith(child_key):
                    node = child_node
                    i += len(child_key)
                    matched = True
                    break
            if not matched:
                return False
        return node.is_end_of_word
    
    def _common_prefix_length(self, str1: str, str2: str) -> int:
        """Utility function to find the common prefix length between two strings."""
        min_length = min(len(str1), len(str2))
        for i in range(min_length):
            if str1[i] != str2[i]:
                return i
        return min_length

Explanation of Code

  • Compression: Nodes only store substrings with shared prefixes compressed into a single node. This reduces the number of nodes for words with common prefixes.
  • Common Prefix Check: When inserting a word, we check for the longest common prefix with existing children, updating nodes and creating new ones as necessary.

Track your progress

Mark this subtopic as completed when you finish reading.