The Two Heaps technique is a problem-solving approach that involves maintaining two heaps (or priority queues) to efficiently handle dynamic data, particularly when you need to calculate medians or other statistics from a stream of numbers. The two heaps are:
- Max Heap: This heap stores the smaller half of the numbers. The root (top element) of this heap is the maximum of the smaller half.
- Min Heap: This heap stores the larger half of the numbers. The root of this heap is the minimum of the larger half.
The goal is to balance the two heaps such that:
- The Max Heap always contains the smaller half of the data.
- The Min Heap contains the larger half. If the total number of elements is odd, the extra element is kept in the Max Heap.
This allows you to access the median in O(1) time, and inserting a new number can be done in O(log n) due to heap insertions and balancing operations.
Key Concept:
- Use a Max Heap for the smaller half and a Min Heap for the larger half.
- Maintain balance by ensuring that the difference in size between the two heaps is at most 1.
How to Identify This Pattern in a Given Coding Question
You can identify the need for the Two Heaps technique in problems where: - You are asked to find the median of a data stream or maintain the median dynamically as new elements are added. - The problem involves dynamic range queries or handling both minima and maxima efficiently as data is updated.
Example Questions:
Find the median from a stream of integers.
- Keep track of the median as a list of numbers is updated.
- Maintain the top k largest or smallest numbers dynamically.
Common phrases that indicate this pattern might be:
- “Find the median in a stream.”
- “Maintain the median dynamically.”
- “Track the median as numbers are inserted.”
What Are the Variances of It
- Median Finder: Using two heaps to dynamically maintain the median of a stream of numbers.
- Top-k Elements: Modify the technique to track the top k largest or smallest numbers.
- Range Queries: Use two heaps to maintain and balance data for querying the range (e.g., min or max) efficiently in real-time.
Tricks to Keep in Mind
- Balancing Heaps: After every insertion, ensure that both heaps are balanced by either transferring an element from one heap to the other or adjusting the sizes of the heaps so that their size difference is never more than 1.
- Handling Odd/Even Cases: If the total number of elements is odd, keep the extra element in the Max Heap (or alternatively the Min Heap). For even numbers, ensure both heaps have equal elements.
- Efficient Median Calculation: If the number of elements is odd, the median is the root of the Max Heap. If it’s even, the median is the average of the roots of both heaps.
Common Coding Questions
Example 1: Median Finder with Two Heaps
import heapq
class MedianFinder:
def __init__(self):
# Max heap to store the smaller half (as a negative min-heap)
self.max_heap = [] # Python only has min-heap, so we invert the numbers
# Min heap to store the larger half
self.min_heap = []
def add_num(self, num: int) -> None:
"""
Adds a number to the data structure.
"""
# Step 1: Add the number to the max heap (invert the number to simulate max heap)
heapq.heappush(self.max_heap, -num)
# Step 2: Balance the heaps: Ensure every element in max_heap is <= min_heap
if self.max_heap and self.min_heap and (-self.max_heap[0] > self.min_heap[0]):
heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
# Step 3: Balance the size: Ensure the size difference between heaps is at most 1
if len(self.max_heap) > len(self.min_heap) + 1:
heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
elif len(self.min_heap) > len(self.max_heap):
heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))
def find_median(self) -> float:
"""
Returns the median of all elements so far.
"""
if len(self.max_heap) > len(self.min_heap):
return -self.max_heap[0] # Max-heap root is the median if odd number of elements
else:
return (-self.max_heap[0] + self.min_heap[0]) / 2.0 # Average of two roots if even
# Example usage
mf = MedianFinder()
mf.add_num(3)
mf.add_num(1)
print(mf.find_median()) # Output: 2.0
mf.add_num(5)
mf.add_num(4)
print(mf.find_median()) # Output: 3.5
Example 2: Top-K Elements using Two Heaps
import heapq
class TopKElements:
def __init__(self, k: int):
self.k = k
self.min_heap = [] # Min-heap to keep track of top-k largest elements
def add_num(self, num: int) -> None:
"""
Adds a number to the heap and maintains the top k largest elements.
"""
heapq.heappush(self.min_heap, num)
# Ensure the heap contains at most k elements
if len(self.min_heap) > self.k:
heapq.heappop(self.min_heap)
def get_top_k(self) -> list:
"""
Returns the top k largest elements in any order.
"""
return self.min_heap
# Example usage
top_k = TopKElements(3)
top_k.add_num(3)
top_k.add_num(1)
top_k.add_num(5)
top_k.add_num(10)
top_k.add_num(2)
print(top_k.get_top_k()) # Output: [3, 5, 10] (top 3 elements)