Trie


Trie Data Structure

The Trie, also known as a prefix tree, is a tree-like data structure used for efficient retrieval of keys with common prefixes. It is commonly used for tasks like searching for words in a dictionary, autocomplete functionality, and storing sets of strings.

In a Trie, each node represents a character, and the edges represent the connections between characters. The root node represents an empty string, and each subsequent node represents a prefix formed by appending a character to the previous node's prefix.

The Trie data structure supports efficient insertion, deletion, and search operations, making it a suitable choice for various string-related problems.

Implementation of Trie in Python

Let's implement the Trie data structure in Python:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        current = self.root
        for char in word:
            if char not in current.children:
                current.children[char] = TrieNode()
            current = current.children[char]
        current.is_end_of_word = True

    def search(self, word):
        current = self.root
        for char in word:
            if char not in current.children:
                return False
            current = current.children[char]
        return current.is_end_of_word

    def startsWith(self, prefix):
        current = self.root
        for char in prefix:
            if char not in current.children:
                return False
            current = current.children[char]
        return True

In the above code, we define two classes: TrieNode and Trie. The TrieNode class represents a single node in the Trie, and the Trie class is the main Trie data structure.

The TrieNode class has two properties: children, which is a dictionary to store the child nodes, and is_end_of_word, which indicates whether the node represents the end of a word.

The Trie class has three methods:

  • insert(word): Inserts a word into the Trie by traversing through each character and creating new nodes as needed.
  • search(word): Searches for a word in the Trie by traversing through each character and checking if the word exists in the Trie.
  • startsWith(prefix): Checks if any word in the Trie starts with the given prefix by traversing through each character and checking if the prefix exists in the Trie.

Now, let's solve a question from LeetCode using the Trie data structure.

Problem: Implement Trie (Prefix Tree)

Implement a Trie with the following methods:

  • insert(word): Inserts a word into the Trie.
  • search(word): Returns true if the word is in the Trie.
  • startsWith(prefix): Returns true if there is any word in the Trie that starts with the given prefix.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        current = self.root
        for char in word:
            if char not in current.children:
                current.children[char] = TrieNode()
            current = current.children[char]
        current.is_end_of_word = True

    def search(self, word):
        current = self.root
        for char in word:
            if char not in current.children:
                return False
            current = current.children[char]
        return current.is_end_of_word

    def startsWith(self, prefix):
        current = self.root
        for char in prefix:
            if char not in current.children:
                return False
            current = current.children[char]
        return True


# Create a Trie object
trie = Trie()

# Insert words into the Trie
trie.insert("apple")
trie.insert("banana")
trie.insert("orange")

# Search for words in the Trie
print(trie.search("apple"))  # Output: True
print(trie.search("orange"))  # Output: True
print(trie.search("grape"))  # Output: False

# Check if words start with a prefix
print(trie.startsWith("app"))  # Output: True
print(trie.startsWith("ora"))  # Output: True
print(trie.startsWith("ban"))  # Output: False

In the above code, we create a Trie object and insert three words: "apple", "banana", and "orange". Then, we perform search and startsWith operations to test the functionality of the Trie.

Time Complexity The time complexity of inserting a word into the Trie is O(k), where k is the length of the word. The time complexity of searching for a word or a prefix in the Trie is O(k), where k is the length of the word or prefix.

Space Complexity The space complexity of the Trie depends on the number of unique characters in the input words and the number of words inserted. Let n be the total number of characters and m be the number of words inserted.

  • The space complexity of the Trie: O(n)
  • Total space complexity: O(n)

Let's consider the following problem as an example:

Problem: Word Search Given an m x n board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once.

Example:

board = [    ['A', 'B', 'C', 'E'],
    ['S', 'F', 'C', 'S'],
    ['A', 'D', 'E', 'E']
]

word = "ABCCED"

Solution:

We can solve this problem using a depth-first search (DFS) algorithm and a Trie data structure. First, we'll implement the Trie data structure, and then we'll use it to solve the Word Search problem.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        current = self.root
        for char in word:
            if char not in current.children:
                current.children[char] = TrieNode()
            current = current.children[char]
        current.is_word = True


class Solution:
    def exist(self, board, word):
        # Build the Trie
        trie = Trie()
        trie.insert(word)
        
        rows, cols = len(board), len(board[0])
        visited = [[False] * cols for _ in range(rows)]

        # Helper function for DFS
        def dfs(row, col, node):
            # Check if the current node represents a complete word
            if node.is_word:
                return True

            # Check for out-of-bounds or visited cells
            if row < 0 or row >= rows or col < 0 or col >= cols or visited[row][col]:
                return False

            # Check if the current character matches the Trie node
            char = board[row][col]
            if char not in node.children:
                return False

            # Mark the cell as visited
            visited[row][col] = True

            # Recursively check the neighbors
            for dx, dy in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
                if dfs(row + dx, col + dy, node.children[char]):
                    return True

            # Unmark the cell as visited
            visited[row][col] = False

            return False

        # Perform DFS for each cell on the board
        for row in range(rows):
            for col in range(cols):
                if dfs(row, col, trie.root):
                    return True

        return False

In the above code, we define the TrieNode and Trie classes, similar to the previous explanation. Then, we define the Solution class, which contains the exist method to solve the Word Search problem.

The exist method performs a depth-first search (DFS) on each cell of the board and checks if a valid word exists starting from that cell. The dfs helper function is recursively called to explore the neighboring cells and match the characters with the Trie nodes.

Time Complexity:

  • Building the Trie: O(L), where L is the total number of characters in the words.
  • Performing DFS on the board: O(M * N * 4^L), where M is the number of rows, N is the number of columns, and L is the length of the longest word.

Space Complexity:

  • Trie: O(L), where L is the total number of characters in the words.
  • Visited array: O(M * N), where M is the number of rows and N is the number of columns.

Overall, the time and space complexity of the solution depend on the size of the input board and the length of the words in the Trie.

 

Next Post Previous Post