Find Median from Data Stream

 

1. Clarify the problem:

Before we begin, let's clarify the problem statement. The "Find Median from Data Stream" problem asks us to design a data structure that supports adding integers and finding the median of the elements seen so far. The median should be computed in constant time and can be any valid value if there are multiple medians.

2. Analyze the problem:

Let's analyze the problem to identify the input, output, and constraints.

Input:

  • A sequence of integers that will be added to the data structure.

Output:

  • The median of the elements seen so far.

Constraints:

  • The input integers can be in any order.
  • The number of elements added is at most 10^5.

3. Design an algorithm:

To solve this problem, we can use two heaps: a max-heap to store the smaller half of the elements and a min-heap to store the larger half of the elements. Here's the general outline of our algorithm:

  1. Initialize two heaps: max_heap and min_heap.
  2. When adding an element:
    • If both heaps are empty or the element is smaller than the root of max_heap:
      • Push the element into max_heap.
    • Otherwise:
      • Push the negation of the element into min_heap.
    • Balance the heaps:
      • If the size difference between the heaps is greater than 1:
        • If max_heap has more elements:
          • Pop the root of max_heap and push its negation into min_heap.
        • Otherwise:
          • Pop the negation of the root of min_heap and push it into max_heap.
  3. When finding the median:
    • If the size of the heaps is the same:
      • Return the average of the roots of max_heap and min_heap.
    • Otherwise:
      • Return the root of the heap with more elements.

4. Explain your approach:

Our approach involves using two heaps to efficiently compute the median of the elements seen so far. We maintain a max-heap to store the smaller half of the elements and a min-heap to store the larger half. By balancing the heaps and considering the sizes of the heaps, we can easily find the median in constant time.

5. Write clean and readable code:

Let's implement the algorithm in Python:

python
class MedianFinder: def __init__(self): self.max_heap = [] # max-heap to store smaller half of elements self.min_heap = [] # min-heap to store larger half of elements def addNum(self, num): if not self.max_heap or num < -self.max_heap[0]: heapq.heappush(self.max_heap, -num) else: heapq.heappush(self.min_heap, num) # Balance the heaps if len(self.max_heap) - len(self.min_heap) > 1: heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap)) elif len(self.min_heap) > len(self.max_heap): heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap)) def findMedian(self): if len(self.max_heap) == len(self.min_heap): return (-self.max_heap[0] + self.min_heap[0]) / 2.0 else: return -self.max_heap[0]

6. Test your code:

Let's test our code with different test cases to ensure its correctness. We'll consider the following cases:

  • Case 1:

    python
  • medianFinder = MedianFinder() medianFinder.addNum(1) medianFinder.addNum(2) medianFinder.addNum(3) print(medianFinder.findMedian())

    The expected output is 2.0.

  • Case 2:

    python
  • medianFinder = MedianFinder() medianFinder.addNum(-1) medianFinder.addNum(-2) print(medianFinder.findMedian())

    The expected output is -1.5.

  • Case 3:

    python
    • medianFinder = MedianFinder() medianFinder.addNum(3) medianFinder.addNum(-2) medianFinder.addNum(-1) medianFinder.addNum(5) print(medianFinder.findMedian())

      The expected output is -1.0.

    7. Optimize if necessary:

    The current solution is already efficient, and further optimization is not necessary.

    8. Handle error cases:

    Our code does not have any explicit error handling, as the problem does not specify any error cases. However, we can assume that the input will always be valid integers.

    9. Discuss complexity analysis:

    The time complexity of adding a number is O(log N) because we perform heap operations. Finding the median has a time complexity of O(1) since we only access the roots of the heaps.

    The space complexity is O(N) because we store all the elements in the heaps. However, since the maximum number of elements added is limited to 10^5, the space complexity is reasonable.

    During the problem-solving process, we made a trade-off between time and space complexity. By using two heaps, we can efficiently find the median in constant time, but it requires additional space to store the elements.

    Next Post Previous Post