Hashing

Hash Map Fundamentals

A hash map (hash table) stores key-value pairs and uses a hash function to compute an index into an array of buckets, achieving O(1) average-case put, get, and delete. It is the single most frequently tested data structure in coding interviews — Two Sum, frequency counting, grouping, and prefix-sum problems all rely on the constant-time lookup that only a hash map provides.

01
Section One · Foundation

What is Hash Map?

“Alice”:95 “Bob”:87 “Carol”:91 [0] [1] [2] [3] [4] “Bob” h()

A hash map maps keys to values using a hash function that converts each key into an integer index. That index determines which bucket in an internal array stores the key-value pair. Because computing the hash and jumping to the bucket are both O(1), put/get/delete are all O(1) on average. The catch: two different keys can hash to the same bucket — a collision — which must be resolved.

Two main strategies for handling collisions:

Chaining

Linked Lists

Each bucket holds a linked list (or tree) of entries that hashed to the same index. On collision, append to the list. Java’s HashMap uses this — lists upgrade to red-black trees at 8 entries.

  • Simple to implement
  • Graceful degradation under load
  • Extra pointer overhead per entry
Open Addressing

Probing

All entries live inside the array itself. On collision, probe for the next open slot (linear, quadratic, or double hashing). Python’s dict uses this approach.

  • Cache-friendly — no pointer chasing
  • Needs a good probe sequence
  • Degrades badly above ~70% load
Analogy: A coat check at a theatre. You hand your coat (value) with a ticket number (key). The attendant uses a formula to decide which rack (bucket) to hang it on. When you return with your ticket, the formula instantly tells them which rack to check — no scanning every coat. If two tickets map to the same rack, both coats hang there and you check the name tag.
Key Characteristics
Hash function maps key → index
O(1) average lookup — no scanning
Collisions are inevitable
Chaining or probing handles them
Load factor triggers resize
Rehash all entries when table grows — O(n) but rare
Keys must be immutable & hashable
Mutable keys break the hash → lost entries
No ordering of entries
Use TreeMap / sorted dict if order matters
02
Section Two · Operations

Core Operations

Put (Insert / Update)
Computes the bucket index from the key’s hash, then inserts the key-value pair or updates the value if the key already exists.
Pseudocode
function put(map, key, value): index = hash(key) % map.capacity bucket = map.table[index] for entry in bucket: if entry.key == key: entry.value = value // update existing return bucket.append(key, value) // new entry map.size += 1 if map.size / map.capacity > LOAD_FACTOR: resize(map) // rehash all entries
The load factor controls the space-time trade-off. Java’s default is 0.75 — when 75% of buckets are occupied, the table doubles and all entries are rehashed. A lower load factor wastes memory but reduces collisions; a higher one saves memory but slows lookups. For interviews, 0.75 is always the right answer.
Get (Lookup)
Computes the bucket index from the key’s hash, then searches that single bucket for a matching key and returns its value.
Pseudocode
function get(map, key): index = hash(key) % map.capacity bucket = map.table[index] for entry in bucket: if entry.key == key: return entry.value // found — O(1) average return null // key not present
Use getOrDefault() to avoid null checks. In Java, map.getOrDefault(key, 0) returns 0 if the key is absent — perfect for frequency counting. In Python, dict.get(key, 0) or collections.defaultdict(int) does the same. This eliminates the entire “check then get” pattern.
Remove (Delete)
Locates the entry by hashing the key, then removes it from the bucket and decrements the size.
Pseudocode
function remove(map, key): index = hash(key) % map.capacity bucket = map.table[index] for entry in bucket: if entry.key == key: bucket.remove(entry) // O(1) for linked list map.size -= 1 return entry.value return null // key not found
Open addressing requires tombstones. In chaining, removing a node from the linked list is straightforward. In open addressing, you cannot simply empty the slot — that would break probe chains for other keys. Instead, mark the slot as “deleted” (a tombstone) so probing continues past it. This is why chaining is simpler to implement correctly.
Frequency Count Pattern
Uses a hash map to count occurrences of each element in O(n) — the single most common hash map usage in interviews.
Pseudocode — Count Frequencies
function countFrequencies(arr): freq = {} // empty hash map for element in arr: freq[element] = freq.getOrDefault(element, 0) + 1 return freq // O(n) time, O(k) space // Two Sum — the canonical hash map problem function twoSum(nums, target): seen = {} // value → index for i, num in nums: complement = target - num if complement in seen: return [seen[complement], i] seen[num] = i // O(n) total — one pass
“Can I reduce O(n²) to O(n)?” — the answer is almost always a hash map. Whenever you have a nested loop checking “does X exist?” or “have I seen Y before?”, replace the inner loop with a hash map lookup. This converts O(n²) brute force into O(n) single-pass — the core insight behind Two Sum, anagram grouping, and subarray sum problems.
Common Mistake: Using mutable objects as keys. In Java, if you put a List as a key and then modify the list, its hash code changes but the map still has it in the old bucket — the entry is effectively lost. Always use immutable keys: String, Integer, or unmodifiable collections. In Python, use tuples (hashable) instead of lists (not hashable).
03
Section Three · Visuals

Diagrams & Memory Layout

Hash Map Structure — Chaining
Bucket array with linked-list chains for collision resolution
[0] [1] [2] [3] “apple”:5 “date”:2 empty “banana”:3 “grape”:7 ← collision chain “cherry”:1 hash(key) % capacity → bucket index — O(1) average lookup
Put Operation — Hash, Locate, Insert
Inserting “fig”:4 — hash to bucket, append to chain
① hash(“fig”) = 7 ② 7 % 4 = 3 → bucket[3] ③ scan chain: “cherry” ≠ “fig” ④ append (“fig”, 4) to chain [3] “cherry”:1 “fig”:4 new O(1) average — bucket has ~1 entry when load factor ≤ 0.75
Rehashing — Table Resize
When load exceeds threshold, double capacity and rehash all entries
BEFORE (cap=4, load=3/4=0.75) A:1 B:2 C:3 rehash AFTER (cap=8, load=3/8=0.375) A:1 B:2 C:3 Rehash is O(n) but happens rarely — amortized O(1) per put
04
Section Four · Code

Java Quick Reference

The most common hash map operations using Java’s HashMap — plus essential interview utilities like getOrDefault and merge.

Java — HashMap Core Operations
import java.util.*; Map<String, Integer> map = new HashMap<>(); // O(1) avg — put map.put("apple", 5); map.put("banana", 3); map.put("cherry", 7); // O(1) avg — get int val = map.get("banana"); // 3 int def = map.getOrDefault("fig", 0); // 0 (not found) // O(1) avg — containsKey boolean has = map.containsKey("apple"); // true // O(1) avg — remove map.remove("cherry"); // Frequency counting — the #1 interview pattern map.merge("apple", 1, Integer::sum); // increment by 1 // Iterate entries for (Map.Entry<String, Integer> e : map.entrySet()) System.out.println(e.getKey() + " = " + e.getValue());
05
Section Five · Complexity

Time & Space Complexity

OperationBest CaseAverage CaseWorst CaseSpace
PutO(1)O(1)O(n) all keys collideO(1)
GetO(1)O(1)O(n) all keys collideO(1)
RemoveO(1)O(1)O(n) all keys collideO(1)
ContainsKeyO(1)O(1)O(n) all keys collideO(1)
Why these complexities? The hash function distributes keys uniformly across buckets. With a good hash and a load factor ≤ 0.75, each bucket holds ~1 entry on average — so every operation is a constant-time index calculation plus a constant-time chain scan. The O(n) worst case only happens if all keys hash to the same bucket (adversarial input). Java 8 mitigates this by converting long chains to red-black trees, capping the worst case at O(log n).
Space Complexity Note

A hash map of n entries uses O(n) space for the entries plus O(capacity) for the bucket array. Because capacity is proportional to n (load factor keeps it within 1.33× to 2×), total space is O(n). Each entry in a chaining implementation has key + value + pointer overhead. Rehashing temporarily uses O(n) extra space during the copy. For interviews, state “O(n) space” and you’re correct.

06
Section Six · Decision Guide

When to Use Hash Map

Use This When

  • You need O(1) lookup by key — frequency counting, “have I seen this before?”, Two Sum complement check, and anagram grouping all hinge on constant-time contains/get.
  • Reducing O(n²) to O(n) — any time a brute-force solution uses nested loops to “find a match”, a hash map can replace the inner loop with a single lookup.
  • Prefix-sum with target — storing prefix sums as keys lets you check “does prefix[j] − target exist?” in O(1) — the pattern behind subarray sum problems.

Avoid When

  • You need sorted order — hash maps have no ordering guarantee. Use a TreeMap (Java) or SortedDict (Python) for ordered iteration.
  • Keys are small integers — a plain array with key-as-index is faster and uses less memory (e.g., character frequency: int[26] beats a HashMap).
Comparison vs Alternatives
StructureLookupInsertOrdered?Best For
HashMap ← this oneO(1) avgO(1) avgNoFrequency count, Two Sum, grouping
TreeMapO(log n)O(log n)Yes (sorted)Range queries, ordered iteration
HashSetO(1) avgO(1) avgNoMembership check (keys only, no values)
07
Section Seven · Interview Practice

LeetCode Problems

Hash map interview problems fall into four patterns: frequency counting (count occurrences, check anagrams), complement lookup (Two Sum and its variants), prefix-sum caching (subarray sum equals K), and grouping by key (group anagrams, bucket sort). If you can spot which pattern applies, the solution writes itself.

  • Easy Two Sum — Store each number’s index; for each element, check if target − num exists in the map — O(n) single pass.
  • Medium Group Anagrams — Sort each word (or count its characters) to form a key; group words with the same key in a map — O(n·k log k).
  • Medium Top K Frequent Elements — Count frequencies with a map, then use a min-heap of size K or bucket sort to extract the top K — O(n).
  • Medium Longest Consecutive Sequence — Put all numbers in a HashSet; for each number that isn’t num − 1 in the set, count the streak forward — O(n).
  • Hard Subarray Sum Equals K — Cache prefix sums in a map; at each index check if prefix − K exists — classic prefix-sum + hash map in O(n).
Interview Pattern: When a brute-force approach uses a nested loop asking “does this value exist?” or “have I seen this complement?”, replace the inner loop with a hash map lookup. This single refactoring turns O(n²) into O(n) — and it applies to roughly 30% of all LeetCode mediums.