Bloom Filter


Bloom Filter is a probabilistic data structure used to test whether an element is a member of a set. It provides a space-efficient way to perform membership queries, but it may have false positives (indicating that an element is in the set when it is not) due to hash collisions.

Here's an example of implementing a Bloom Filter in Python:

class BloomFilter:
    def __init__(self, size, num_hash_functions):
        self.size = size
        self.num_hash_functions = num_hash_functions
        self.bit_array = [False] * size

    def add(self, element):
        for i in range(self.num_hash_functions):
            hash_value = self._hash(element, i)
            self.bit_array[hash_value] = True

    def contains(self, element):
        for i in range(self.num_hash_functions):
            hash_value = self._hash(element, i)
            if not self.bit_array[hash_value]:
                return False
        return True

    def _hash(self, element, seed):
        hash_value = hash(element)
        return (hash_value + seed) % self.size

In this example, we create a BloomFilter class with the specified size and the number of hash functions to use. The bit_array is initialized with all False values. The _hash() function calculates the hash value using the built-in hash() function and applies the modulo operation to ensure it fits within the array size.

The add() method adds an element to the Bloom Filter by setting the corresponding bits in the bit_array using different hash functions.

The contains() method checks if an element is potentially in the Bloom Filter by checking the corresponding bits in the bit_array using the same hash functions. If all the bits are set, it returns True; otherwise, it returns False.

Now let's solve a problem using a Bloom Filter. The problem we'll solve is "Contains Duplicate II"

Problem Statement: Given an array of integers nums and an integer k, return True if there are any distinct indices i and j in the array such that nums[i] == nums[j] and abs(i - j) <= k.

Example:

nums = [1, 2, 3, 1]
k = 3

Output:

True

Explanation: In this example, the number 1 appears at indices 0 and 3, and their distance is less than or equal to 3.

Solution: We can use a Bloom Filter to efficiently solve this problem. We'll iterate over the array and for each number, check if it is already in the Bloom Filter. If it is, we've found a duplicate at a distance less than or equal to k. If not, we'll add the number to the Bloom Filter.

def containsNearbyDuplicate(nums, k):
    bloom_filter = BloomFilter(size=len(nums), num_hash_functions=2*k)
    for i, num in enumerate(nums):
        if bloom_filter.contains(num):
            return True
        bloom_filter.add(num)
        if i >= k:
            bloom_filter.remove(nums[i - k])
    return False

nums = [1, 2, 3, 1]
k = 3
print(containsNearbyDuplicate(nums, k))  # Output: True

In this solution, we create a Bloom Filter with a size equal to the length of the array nums and a number of hash functions equal to 2*k. We iterate over the array, checking if each number is already in the Bloom Filter. If it is, we return True. If not, we add the number to the Bloom Filter. Additionally, we remove elements from the Bloom Filter that are outside the window of size k to keep it updated.

The time complexity of this solution is O(n) since we iterate over the array once. The space complexity is also O(n) since, in the worst case, we may need to store all the elements in the Bloom Filter.

 

Next Post Previous Post