LeetCode ยท #295

Find Median from Data Stream Solution

Design a data structure that supports adding integers from a stream and finding the median of all elements so far in O(1).

01
Section One ยท Problem

Problem Statement

๐Ÿท๏ธ

Difficulty

Hard

๐Ÿ”—

LeetCode

Problem #295

๐Ÿ—๏ธ

Pattern

Two Heaps โ€” max-heap (lower) + min-heap (upper)

Implement a MedianFinder class with addNum(num) to add an integer from the data stream, and findMedian() to return the median of all elements seen so far. If the count is even, return the average of the two middle values.

Example
MedianFinder mf = new MedianFinder(); mf.addNum(1); // stream: [1] mf.addNum(2); // stream: [1, 2] mf.findMedian(); // โ†’ 1.5 (average of 1 and 2) mf.addNum(3); // stream: [1, 2, 3] mf.findMedian(); // โ†’ 2.0 (middle element)
Constraints
โ€ข -10โต โ‰ค num โ‰ค 10โต โ€ข At most 5 ร— 10โด calls to addNum and findMedian โ€ข findMedian is called only after at least one addNum
02
Section Two ยท Approach 1

Sorted List โ€” O(n) insert, O(1) median

Maintain a sorted list. On each addNum, insert the value at the correct position using binary search. The median is at the middle index. O(1) to find, but O(n) per insert due to shifting.

Why it works

A sorted list gives direct index access to the median. Binary search finds the position in O(log n), but shifting elements costs O(n).

Why we can do better
Problem: O(n) per insert due to array shifting. We need a data structure that maintains "the smaller half" and "the larger half" with O(log n) operations. Two heaps โ€” a max-heap for the lower half and a min-heap for the upper half โ€” give O(log n) insert and O(1) median.
Java โ€” Sorted list (bisect insert)
import java.util.*; class MedianFinder { List<Integer> list = new ArrayList<>(); public void addNum(int num) { int pos = Collections.binarySearch(list, num); if (pos < 0) pos = -(pos + 1); list.add(pos, num); // O(n) shift } public double findMedian() { int n = list.size(); return n % 2 == 1 ? list.get(n / 2) : (list.get(n / 2 - 1) + list.get(n / 2)) / 2.0; } }
Python โ€” Sorted list (bisect)
import bisect class MedianFinder: def __init__(self): self.data = [] def addNum(self, num: int) -> None: bisect.insort(self.data, num) # O(n) shift def findMedian(self) -> float: n = len(self.data) if n % 2: return self.data[n // 2] return (self.data[n // 2 - 1] + self.data[n // 2]) / 2
MetricValue
addNumO(n) โ€” shifting after binary search
findMedianO(1) โ€” direct index access
03
Section Three ยท Approach 2

Two Heaps โ€” O(log n) add, O(1) median

Split the stream into two halves: a max-heap holding the smaller half (root = largest of the smalls) and a min-heap holding the larger half (root = smallest of the bigs). The median is always at the tops of the heaps.

๐Ÿ’ก Mental model: Imagine a class of students lined up by height. You split them into two groups: the shorter half and the taller half. The tallest short student and the shortest tall student stand at the split point. The median is always one of them (or their average). A max-heap and min-heap maintain these two groups, keeping the boundary students instantly accessible.
Algorithm
  • Invariants: (1) Every element in max-heap โ‰ค every element in min-heap. (2) Sizes differ by at most 1.
  • addNum(num):
  • Add num to the max-heap (lower half).
  • Move the root of max-heap to min-heap (ensures ordering).
  • If min-heap is larger, move its root back to max-heap (balance sizes).
  • findMedian():
  • If sizes are equal โ†’ average of both roots.
  • If max-heap is larger โ†’ its root is the median.
๐ŸŽฏ When to recognize this pattern:
  • Any problem requiring the median or middle element of a dynamic collection โ€” think two heaps.
  • The signal is "maintain a running median as elements arrive." This pattern also appears in LC 480 (Sliding Window Median) and problems involving the kth percentile in a stream.
Why this balancing works:
  • By always adding to max-heap first, then moving max-heap's root to min-heap, we guarantee ordering (max-heap root โ‰ค min-heap root).
  • Then rebalancing by size ensures both halves stay within one element of each other.
  • Three heap operations per add, each O(log n).
04
Section Four ยท Trace

Visual Walkthrough

Trace adding [5, 2, 8, 1]:

Two Heaps โ€” max-heap (lower) | min-heap (upper)
STREAM: add 5, add 2, add 8, add 1 add(5) โ†’ max-heap: push 5 โ†’ [5]. Move root 5 to min-heap โ†’ min:[5], max:[]. min-heap bigger โ†’ move 5 back to max-heap. max: [5] min: [] median = 5.0 add(2) โ†’ max-heap: push 2 โ†’ [5,2]. Move root 5 to min-heap โ†’ min:[5], max:[2]. Sizes equal โ†’ no rebalance. max: [2] min: [5] median = (2+5)/2 = 3.5 add(8) โ†’ max-heap: push 8 โ†’ [8,2]. Move root 8 to min-heap โ†’ min:[5,8], max:[2]. min-heap bigger โ†’ move root 5 to max-heap โ†’ max:[5,2], min:[8]. max: [5,2] min: [8] median = 5.0 (max-heap root) add(1) โ†’ max-heap: push 1 โ†’ [5,2,1]. Move root 5 to min-heap โ†’ min:[5,8], max:[2,1]. Sizes equal โ†’ no rebalance. Stream sorted: [1,2,5,8]. max: [2,1] min: [5,8] median = (2+5)/2 = 3.5 FINAL STATE โ€” sorted view: 1 2 | 5 8 lower half (max-heap) | upper half (min-heap) median = (2 + 5) / 2 = 3.5 All medians correct โœ“
05
Section Five ยท Implementation

Code โ€” Java & Python

Java โ€” Two Heaps
import java.util.*; class MedianFinder { PriorityQueue<Integer> lo; // max-heap (lower half) PriorityQueue<Integer> hi; // min-heap (upper half) public MedianFinder() { lo = new PriorityQueue<>(Collections.reverseOrder()); hi = new PriorityQueue<>(); } public void addNum(int num) { lo.offer(num); // always add to lower half first hi.offer(lo.poll()); // move max of lower to upper (ordering) if (hi.size() > lo.size()) // rebalance: lower half holds โ‰ฅ elements lo.offer(hi.poll()); } public double findMedian() { if (lo.size() > hi.size()) return lo.peek(); // odd count โ€” lower half has extra return (lo.peek() + hi.peek()) / 2.0; // even count โ€” average roots } }
Python โ€” Two Heaps (negate for max-heap)
import heapq class MedianFinder: def __init__(self): self.lo = [] # max-heap (negate values) self.hi = [] # min-heap def addNum(self, num: int) -> None: heapq.heappush(self.lo, -num) heapq.heappush(self.hi, -heapq.heappop(self.lo)) if len(self.hi) > len(self.lo): heapq.heappush(self.lo, -heapq.heappop(self.hi)) def findMedian(self) -> float: if len(self.lo) > len(self.hi): return -self.lo[0] return (-self.lo[0] + self.hi[0]) / 2
06
Section Six ยท Analysis

Complexity Analysis

ApproachaddNumfindMedianSpace
Sorted listO(n)O(1)O(n)
Self-balancing BSTO(log n)O(log n)O(n)
Two Heaps โ† optimal O(log n) O(1) O(n)
Why two heaps beat a sorted array:
  • The sorted array gives O(1) median but O(n) inserts.
  • Two heaps give O(log n) inserts (3 heap operations) and O(1) median (peek both roots).
  • The trade-off is optimal for streaming data where you add frequently and query often.
07
Section Seven ยท Edge Cases

Edge Cases & Pitfalls

CaseInputWhy It Matters
Single elementadd(42)lo=[42], hi=[]. Median = 42.0.
Two elementsadd(1), add(2)lo=[1], hi=[2]. Median = 1.5.
All sameadd(5), add(5), add(5)All roots are 5. Median = 5.0.
Negative numbersadd(-1), add(-2)lo=[-2], hi=[-1]. Median = -1.5. Negation in Python handles correctly.
Descending streamadd(5), add(4), add(3)Each new element goes to lower half. Rebalancing keeps halves even.
Large stream (5ร—10โด)Many addsO(log n) per add. Heap operations stay fast even at scale.
โš  Common Mistake: Adding directly to the wrong heap based on a comparison like if num < lo.peek(). This conditional logic has edge cases (empty heaps, equal values). The simpler pattern โ€” always add to lo, move lo's root to hi, then rebalance โ€” is unconditional and handles all cases uniformly. Three operations, zero conditionals about value.

โ† Back to Heap problems