LC 295 ยท Find Median from Data Stream โ Solution & Explanation | DSA Guide
Two approaches to LeetCode Find Median from Data Stream โ sorted list O(n) insert and two-heap O(log n) insert O(1) median. Java and Python solutions.
LeetCode ยท #295
Find Median from Data Stream Solution
Design a data structure that supports adding integers from a stream and finding the median of all elements so far in O(1).
Implement a MedianFinder class with addNum(num) to add an integer from the data stream, and findMedian() to return the median of all elements seen so far. If the count is even, return the average of the two middle values.
โข -10โต โค num โค 10โตโข At most 5 ร 10โด calls to addNum and findMedianโข findMedian is called only after at least one addNum
02
Section Two ยท Approach 1
Sorted List โ O(n) insert, O(1) median
Maintain a sorted list. On each addNum, insert the value at the correct position using binary search. The median is at the middle index. O(1) to find, but O(n) per insert due to shifting.
Why it works
A sorted list gives direct index access to the median. Binary search finds the position in O(log n), but shifting elements costs O(n).
Why we can do better
Problem: O(n) per insert due to array shifting. We need a data structure that maintains "the smaller half" and "the larger half" with O(log n) operations. Two heaps โ a max-heap for the lower half and a min-heap for the upper half โ give O(log n) insert and O(1) median.
Split the stream into two halves: a max-heap holding the smaller half (root = largest of the smalls) and a min-heap holding the larger half (root = smallest of the bigs). The median is always at the tops of the heaps.
๐ก Mental model: Imagine a class of students lined up by height. You split them into two groups: the shorter half and the taller half. The tallest short student and the shortest tall student stand at the split point. The median is always one of them (or their average). A max-heap and min-heap maintain these two groups, keeping the boundary students instantly accessible.
Algorithm
Invariants: (1) Every element in max-heap โค every element in min-heap. (2) Sizes differ by at most 1.
addNum(num):
Add num to the max-heap (lower half).
Move the root of max-heap to min-heap (ensures ordering).
If min-heap is larger, move its root back to max-heap (balance sizes).
findMedian():
If sizes are equal โ average of both roots.
If max-heap is larger โ its root is the median.
๐ฏ When to recognize this pattern:
Any problem requiring the median or middle element of a dynamic collection โ think two heaps.
The signal is "maintain a running median as elements arrive." This pattern also appears in LC 480 (Sliding Window Median) and problems involving the kth percentile in a stream.
Why this balancing works:
By always adding to max-heap first, then moving max-heap's root to min-heap, we guarantee ordering (max-heap root โค min-heap root).
Then rebalancing by size ensures both halves stay within one element of each other.
Three heap operations per add, each O(log n).
04
Section Four ยท Trace
Visual Walkthrough
Trace adding [5, 2, 8, 1]:
Two Heaps โ max-heap (lower) | min-heap (upper)
05
Section Five ยท Implementation
Code โ Java & Python
Java โ Two Heaps
import java.util.*;
classMedianFinder {
PriorityQueue<Integer> lo; // max-heap (lower half)PriorityQueue<Integer> hi; // min-heap (upper half)publicMedianFinder() {
lo = newPriorityQueue<>(Collections.reverseOrder());
hi = newPriorityQueue<>();
}
publicvoidaddNum(int num) {
lo.offer(num); // always add to lower half first
hi.offer(lo.poll()); // move max of lower to upper (ordering)if (hi.size() > lo.size()) // rebalance: lower half holds โฅ elements
lo.offer(hi.poll());
}
publicdoublefindMedian() {
if (lo.size() > hi.size())
return lo.peek(); // odd count โ lower half has extrareturn (lo.peek() + hi.peek()) / 2.0; // even count โ average roots
}
}
The sorted array gives O(1) median but O(n) inserts.
Two heaps give O(log n) inserts (3 heap operations) and O(1) median (peek both roots).
The trade-off is optimal for streaming data where you add frequently and query often.
07
Section Seven ยท Edge Cases
Edge Cases & Pitfalls
Case
Input
Why It Matters
Single element
add(42)
lo=[42], hi=[]. Median = 42.0.
Two elements
add(1), add(2)
lo=[1], hi=[2]. Median = 1.5.
All same
add(5), add(5), add(5)
All roots are 5. Median = 5.0.
Negative numbers
add(-1), add(-2)
lo=[-2], hi=[-1]. Median = -1.5. Negation in Python handles correctly.
Descending stream
add(5), add(4), add(3)
Each new element goes to lower half. Rebalancing keeps halves even.
Large stream (5ร10โด)
Many adds
O(log n) per add. Heap operations stay fast even at scale.
โ Common Mistake: Adding directly to the wrong heap based on a comparison like if num < lo.peek(). This conditional logic has edge cases (empty heaps, equal values). The simpler pattern โ always add to lo, move lo's root to hi, then rebalance โ is unconditional and handles all cases uniformly. Three operations, zero conditionals about value.