Radix Tree

 

Radix Tree (Trie) A Radix Tree, or Trie, is a tree-like data structure used for efficient string searching and storage. It is commonly used for tasks such as autocomplete, spell checking, and IP routing. The key idea behind a Trie is to store strings by breaking them down into individual characters and organizing them in a tree structure.

Each node in the Trie represents a character, and the edges leading from a node represent the possible next characters. By traversing the tree, we can match or search for a particular string efficiently.

Let's define the Node class for the Trie:


class TrieNode:
    def __init__(self):
        self.children = {}  # dictionary to store child nodes
        self.is_end_of_word = False  # flag to indicate end of a word


Insertion To insert a string into the Trie, we start from the root node and check if each character exists in the current node's children. If not, we create a new node and link it to the current node. We repeat this process until all characters of the string are inserted.

Let's implement the insert function:


def insert(root, word):
    current = root

    for char in word:
        if char not in current.children:
            current.children[char] = TrieNode()
        current = current.children[char]

    current.is_end_of_word = True


Search To search for a string in the Trie, we traverse the tree starting from the root and check if each character exists in the current node's children. If any character is missing or we reach the end of the tree before completing the string, it means the string is not present. Otherwise, if we successfully traverse all characters and the last node is marked as the end of a word, the string is found.

Let's implement the search function:


def search(root, word):
    current = root

    for char in word:
        if char not in current.children:
            return False
        current = current.children[char]

    return current.is_end_of_word


Prefix Search To search for all strings with a given prefix, we start from the root node and traverse the tree using the characters of the prefix. Once we reach the end of the prefix, we perform a depth-first search to find all strings under that prefix.

Let's implement the prefix_search function:


def prefix_search(root, prefix):
    current = root

    for char in prefix:
        if char not in current.children:
            return []

        current = current.children[char]

    results = []
    dfs(current, prefix, results)
    return results

def dfs(node, prefix, results):
    if node.is_end_of_word:
        results.append(prefix)

    for char in node.children:
        dfs(node.children[char], prefix + char, results)


Now that we have a good understanding of the Radix Tree, let's solve a problem using this concept.

Problem: Implement Trie (Prefix Tree) Design a Trie (Prefix Tree) data structure that supports the following operations: insert, search, and startsWith.

Here's the complete implementation of the Trie class:


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        insert(self.root, word)

    def search(self, word: str) -> bool:
        return search(self.root, word)

    def startsWith(self, prefix: str) -> bool:
        return bool(prefix_search(self.root, prefix))


Let's test the Trie implementation:


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        insert(self.root, word)

    def search(self, word: str) -> bool:
        return search(self.root, word)

    def startsWith(self, prefix: str) -> bool:
        return bool(prefix_search(self.root, prefix))


Time Complexity

  • Insertion: O(m), where m is the length of the word being inserted.
  • Search: O(m), where m is the length of the word being searched.
  • Prefix Search: O(k), where k is the total number of characters in all the words that match the prefix.

Space Complexity The space complexity of a Trie is O(n * m), where n is the number of words and m is the average length of the words. Each character in each word is stored in a node, so the space usage is proportional to the number of characters.

Let's solve another problem using the concept of a Trie.

Problem: Word Search II Given a 2D board of letters and a list of words, find all words in the board that can be constructed by sequentially connecting adjacent letters, where "adjacent" letters are horizontally or vertically neighboring. Each letter can only be used once in a word.

Here's the step-by-step approach to solve this problem using a Trie:

  1. Create a Trie and insert all the words from the given list into the Trie.
  2. Iterate through each cell in the board.
  3. For each cell, perform a depth-first search (DFS) to traverse the board and check if the current path forms a word in the Trie.
  4. If a word is found, add it to the result list.
  5. Return the result list containing all the words found in the board.

Let's implement the solution:


trie = Trie()

trie.insert("apple")
print(trie.search("apple"))  # Output: True
print(trie.search("app"))    # Output: False
print(trie.startsWith("app"))  # Output: True

trie.insert("app")
print(trie.search("app"))    # Output: True


Let's test the solution with an example:


class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        current = self.root
        for char in word:
            if char not in current.children:
                current.children[char] = TrieNode()
            current = current.children[char]
        current.is_end_of_word = True

def findWords(board, words):
    # Build the Trie
    trie = Trie()
    for word in words:
        trie.insert(word)

    # Initialize variables
    rows = len(board)
    cols = len(board[0])
    result = []

    # Helper function for DFS
    def dfs(i, j, node, path):
        # Check if current cell is out of bounds or the current character is not in the Trie
        if i < 0 or i >= rows or j < 0 or j >= cols or board[i][j] not in node.children:
            return

        # Update the node and path
        char = board[i][j]
        node = node.children[char]
        path += char

        # Check if the current path forms a word
        if node.is_end_of_word:
            result.append(path)
            # Mark the word as visited by setting is_end_of_word to False
            node.is_end_of_word = False

        # Mark the current cell as visited
        board[i][j] = "#"

        # Perform DFS on adjacent cells
        dfs(i + 1, j, node, path)
        dfs(i - 1, j, node, path)
        dfs(i, j + 1, node, path)
        dfs(i, j - 1, node, path)

        # Restore the current cell
        board[i][j] = char

    # Perform DFS for each cell in the board
    for i in range(rows):
        for j in range(cols):
            dfs(i, j, trie.root, "")

    return result

board = [
    ['o', 'a', 'a', 'n'],
    ['e', 't', 'a', 'e'],
    ['i', 'h', 'k', 'r'],
    ['i', 'f', 'l', 'v']
]
words = ["oath", "pea", "eat", "rain"]

print(findWords(board, words))


Time Complexity

  • Building the Trie: O(m), where m is the total number of characters in all the words.
  • DFS for each cell in the board: O(n * 4^k), where n is the number of cells in the board and k is the average length of the words.
  • Total time complexity: O(m + n * 4^k)

Space Complexity

  • The space complexity of the Trie: O(m), where m is the total number of characters in all the words.
  • The space complexity of the recursive DFS: O(k), where k is the average length of the words.
  • Total space complexity: O(m + k)
Next Post Previous Post