Hashing

HashSet Membership & Deduplication

O(1) average add, contains, and remove. A HashSet answers one question instantly: "have I seen this before?" โ€” the backbone of deduplication, cycle detection, duplicate checking, and set operations.

Foundations ยท Array ยท Linked List ยท Stack ยท Queue ยท Deque ยท HashMap ยท HashSet ยท Binary Tree ยท BST ยท Heap ยท Trie ยท Graph
01
Section One ยท Foundation

What is a HashSet?

HashSet 3 7 1 9 5 contains(7)? โ†’ true O(1) contains(4)? โ†’ false O(1) no duplicates ยท no ordering ยท no indices ยท just membership

A HashSet stores unique elements with O(1) average time for add, contains, and remove. Internally, it's backed by a HashMap where elements are keys and values are a dummy constant โ€” so all the hashing, bucket, and collision mechanics from HashMap apply here. The difference is the interface: a HashMap maps keys to values; a HashSet just answers "is this element in the set?" A HashSet does NOT maintain insertion order (use LinkedHashSet for that), does NOT sort elements (use TreeSet for that), and does NOT allow duplicates โ€” adding an existing element is a no-op. In interviews, HashSet is your first reach for: (1) "does this element appear again?" โ€” Contains Duplicate, (2) "collect all unique values" โ€” deduplication, (3) "find the intersection/difference of two collections" โ€” set operations, (4) "have I visited this state before?" โ€” BFS/DFS visited tracking.

Analogy: A guest list at a party. When someone arrives, you check the list: if they're on it, they're already in โ€” no duplicate entry. If they're not, you add them. At any point you can instantly check "is John here?" without scanning the entire list. The list doesn't care about order or how many times John tried to enter โ€” he's either in the set or he's not.
02
Section Two ยท Mental Model

How It Thinks

HashSet is backed by a HashMap internally โ€” element โ†’ dummy value (PRESENT)
โ–ถ
All performance characteristics of HashMap apply: O(1) average, O(n) worst case (all hash to same bucket), initial capacity 16, load factor 0.75, resize at 75% full
Two objects are "the same" if a.equals(b) AND a.hashCode() == b.hashCode()
โ–ถ
If you override equals() on a custom class, you MUST also override hashCode() โ€” otherwise the set can't find the element even if it's logically equal. This is the #1 HashSet bug in interviews.
Adding a duplicate returns false โ€” the set is unchanged
โ–ถ
You can use the boolean return value to detect duplicates: if (!set.add(x)) // x was already present โ€” a one-line duplicate check
HashSet has no ordering guarantee โ€” iteration order is unpredictable
โ–ถ
If you need insertion order โ†’ LinkedHashSet. If you need sorted order โ†’ TreeSet (O(log n) instead of O(1)). For interviews, ordering rarely matters โ€” use HashSet for speed.
HashSet allows one null element
โ–ถ
null hashes to bucket 0. One null is stored; a second add(null) is a no-op. In interviews, this almost never matters โ€” but it's a common trivia question.
HashSet vs boolean[] / int[] for membership on a known range
โ–ถ
If elements are in a small, known range [0, N), a boolean[] is faster (no hashing, no boxing). Use HashSet when the range is unknown, large, or elements aren't integers.
03
Section Three ยท Operations

Core Operations

Operation Method Average Worst Returns
Add add(e) O(1) O(n) true if added, false if duplicate
Contains contains(o) O(1) O(n) true / false
Remove remove(o) O(1) O(n) true if removed, false if absent
Size size() O(1) O(1) int count
Is empty isEmpty() O(1) O(1) true / false
Clear clear() O(n) O(n) void โ€” removes all
Iterate for (E e : set) O(n) O(n) each element once (unordered)
Common Patterns
// โ”€โ”€ Duplicate detection in one line โ”€โ”€ Set<Integer> seen = new HashSet<>(); for (int x : arr) if (!seen.add(x)) return true; // duplicate found! // โ”€โ”€ Deduplicate a list โ”€โ”€ List<Integer> unique = new ArrayList<>(new HashSet<>(list)); // โ”€โ”€ O(1) lookup in two-pass pattern โ”€โ”€ Set<Integer> set = new HashSet<>(Arrays.asList(arr)); for (int x : arr2) if (set.contains(x)) // O(1) instead of O(n) linear search
The add() return value is your secret weapon:
  • set.add(x) returns false if x was already present โ€” this is a single-call duplicate check.
  • No need for a separate contains() call followed by add().
04
Section Four ยท Set vs Map

HashSet vs HashMap โ€” When to Use Which

HashSet

Membership Only

  • "Is x in the collection?"
  • "Collect unique elements"
  • "Have I visited this node?"
  • No associated value needed
  • Set operations: โˆช โˆฉ โˆ–
HashMap

Key โ†’ Value Mapping

  • "How many times does x appear?"
  • "What is the index of x?"
  • "Map x to its result"
  • Need to associate data with each key
  • Frequency counting, caching
Decision rule:
  • If you just need YES/NO membership โ†’ HashSet.
  • If you need to store information WITH each element (count, index, associated value) โ†’ HashMap.
  • Internally they're the same structure โ€” a set is a map with dummy values.
05
Section Five ยท Set Operations

Union, Intersection, Difference

Java's Set interface supports bulk operations that correspond to mathematical set operations. These are surprisingly useful in interviews for problems like "find common elements between two arrays" (LC 349) or "find elements in A but not in B."

Math Operation Java Method Result
A โˆช B Union a.addAll(b) All elements in either set
A โˆฉ B Intersection a.retainAll(b) Only elements in both sets
A โˆ– B Difference a.removeAll(b) Elements in A but not in B
A โІ B Subset check b.containsAll(a) true if all of A is in B
Java โ€” Set operations
Set<Integer> a = new HashSet<>(Arrays.asList(1, 2, 3, 4)); Set<Integer> b = new HashSet<>(Arrays.asList(3, 4, 5, 6)); // Intersection โ€” elements in both Set<Integer> inter = new HashSet<>(a); inter.retainAll(b); // [3, 4] // Union โ€” elements in either Set<Integer> union = new HashSet<>(a); union.addAll(b); // [1, 2, 3, 4, 5, 6] // Difference โ€” in A but not B Set<Integer> diff = new HashSet<>(a); diff.removeAll(b); // [1, 2]
Common Mistake: retainAll() and removeAll() MODIFY the set in place โ€” they don't return a new set. Always copy first if you need to preserve the original: new HashSet<>(a).retainAll(b).
06
Section Six ยท Variants

HashSet vs LinkedHashSet vs TreeSet

Implementation Order add/contains/remove Use when
HashSet None (unpredictable) O(1) average Default โ€” fastest, order doesn't matter
LinkedHashSet Insertion order O(1) average Need iteration in insertion order (LRU cache)
TreeSet Sorted (natural or Comparator) O(log n) Need sorted iteration, floor/ceiling queries
TreeSet extras โ€” sorted set operations
TreeSet<Integer> ts = new TreeSet<>(Arrays.asList(1, 3, 5, 7, 9)); ts.floor(4); // 3 โ€” largest โ‰ค 4 ts.ceiling(4); // 5 โ€” smallest โ‰ฅ 4 ts.lower(5); // 3 โ€” largest < 5 ts.higher(5); // 7 โ€” smallest > 5 ts.subSet(3, 8); // [3, 5, 7] โ€” range view
Interview default:
  • Use HashSet unless the problem requires sorted access (floor/ceiling = TreeSet) or insertion-order iteration (LinkedHashSet).
  • TreeSet is useful for "find nearest element" problems โ€” it's a balanced BST (Red-Black tree) underneath.
07
Section Seven ยท When to Use

When to Reach for a HashSet

Signal Pattern Example
"contains duplicate" / "first repeating" Add elements, check !set.add(x) Contains Duplicate (LC 217)
"unique elements" / "remove duplicates" Collect into set, convert back Remove duplicates from list
"intersection of two arrays" retainAll() or check membership Intersection of Two Arrays (LC 349)
"visited" tracking in BFS/DFS visited.add(state) Word Ladder, Number of Islands
"longest consecutive sequence" Set for O(1) lookup of neighbours Longest Consecutive Sequence (LC 128)
"happy number" / cycle detection Detect repeated state Happy Number (LC 202)
"jewels and stones" / membership check Build set from one input, scan other Jewels and Stones (LC 771)
08
Section Eight ยท Implementation

Build It Once

HashSet problems are typically 5โ€“15 lines. The code is trivial โ€” the skill is recognising when a set applies. Here are the core patterns:

Java โ€” HashSet patterns
// 1. Contains Duplicate โ€” O(n) boolean containsDuplicate(int[] nums) { Set<Integer> seen = new HashSet<>(); for (int x : nums) if (!seen.add(x)) return true; return false; } // 2. Intersection of Two Arrays โ€” O(n + m) int[] intersection(int[] a, int[] b) { Set<Integer> setA = new HashSet<>(); for (int x : a) setA.add(x); Set<Integer> result = new HashSet<>(); for (int x : b) if (setA.contains(x)) result.add(x); return result.stream().mapToInt(Integer::intValue).toArray(); } // 3. Longest Consecutive Sequence โ€” O(n) int longestConsecutive(int[] nums) { Set<Integer> set = new HashSet<>(); for (int x : nums) set.add(x); int longest = 0; for (int x : set) { if (!set.contains(x - 1)) { // start of a sequence int len = 1; while (set.contains(x + len)) len++; longest = Math.max(longest, len); } } return longest; }
09
Section Nine ยท Java Reference

Use It In Java

IN JAVA โ€” java.util.HashSet<E>
Import import java.util.HashSet; / import java.util.Set;
Declare Set<Integer> set = new HashSet<>(); โ€” always use interface type
From array Set<Integer> set = new HashSet<>(Arrays.asList(arr)); โ€” or Set.of() for immutable
Capacity new HashSet<>(capacity) โ€” pre-size to avoid rehashing if you know the count
Method Returns Note
add(e) boolean โ€” false if already present Use return value for duplicate detection
contains(o) boolean O(1) average membership test
remove(o) boolean โ€” false if not present O(1) average
size() int Number of unique elements
isEmpty() boolean size() == 0
addAll(collection) boolean Union โ€” add all elements from another collection
retainAll(collection) boolean Intersection โ€” keep only shared elements (mutates!)
removeAll(collection) boolean Difference โ€” remove all shared elements (mutates!)
containsAll(collection) boolean Subset check โ€” true if all elements of c are in set
โš  Gotcha: If you store custom objects in a HashSet, you MUST override both equals() and hashCode(). If two objects are equals() but have different hash codes, the set treats them as different โ€” you'll get "phantom duplicates." Rule: if a.equals(b) then a.hashCode() == b.hashCode() must hold.
โš  Gotcha: Don't modify an element's fields that affect hashCode() while it's in the set. The set stored it at the bucket for the OLD hash โ€” after mutation, contains() looks in the WRONG bucket and returns false even though the element is physically present. Remove โ†’ modify โ†’ re-add.
โš  Gotcha: int[] doesn't override equals()/hashCode() meaningfully โ€” arrays use identity (reference) comparison. To use arrays as set elements, convert to List<Integer> or String first: set.add(Arrays.toString(arr)).
10
Section Ten ยท Practice

Problems To Solve

HashSet problems are fast to solve โ€” 5โ€“10 lines, O(n) time. The skill is recognising that a set applies: whenever you need O(1) membership checks or duplicate detection, a HashSet is the answer. The hardest HashSet problem is Longest Consecutive Sequence (LC 128) โ€” it requires the insight that you only start counting from sequence beginnings (numbers without a predecessor in the set).

Difficulty Pattern Problem Key Insight
Easy membership Contains Duplicate โ€” LC 217 Add each element. If add() returns false, a duplicate exists. One-liner: return nums.length != new HashSet<>(List.of(nums)).size().
Easy set-ops Intersection of Two Arrays โ€” LC 349 Build set from one array, check membership against the other. Use retainAll() or manual loop.
Easy cycle Happy Number โ€” LC 202 Detect cycle in the digit-square sequence: if a number appears twice, it's not happy. Set tracks seen numbers.
Medium set Longest Consecutive Sequence โ€” LC 128 Set for O(1) lookup. For each number, if num-1 is NOT in the set, count consecutive from num upward. O(n) because each number is visited at most twice.
Medium visited Word Ladder โ€” LC 127 BFS with a HashSet for the word dictionary (O(1) lookup) and a visited set to avoid revisiting. Transform one letter at a time.
Medium constraint-check Valid Sudoku โ€” LC 36 Use sets to track seen digits per row, column, and 3ร—3 box. Any repeated digit violates the board immediately.
Interview Pattern:
  • Whenever you find yourself doing a linear scan to check "have I seen this before?" or "is this in the other list?" โ€” replace it with a HashSet.
  • The upgrade from O(n) per check to O(1) per check is the difference between O(nยฒ) and O(n) total.

โ†’ See the full HashSet practice set