HashSet provides O(1) average membership testing โ the go-to structure for deduplication, contains-checks, and set operations like union, intersection, and difference.
Hashing
HashSet Membership & Deduplication
O(1) average add, contains, and remove. A HashSet answers one question instantly: "have I seen this before?" โ the backbone of deduplication, cycle detection, duplicate checking, and set operations.
A HashSet stores unique elements with O(1) average time for add, contains, and remove. Internally, it's backed by a HashMap where elements are keys and values are a dummy constant โ so all the hashing, bucket, and collision mechanics from HashMap apply here. The difference is the interface: a HashMap maps keys to values; a HashSet just answers "is this element in the set?" A HashSet does NOT maintain insertion order (use LinkedHashSet for that), does NOT sort elements (use TreeSet for that), and does NOT allow duplicates โ adding an existing element is a no-op. In interviews, HashSet is your first reach for: (1) "does this element appear again?" โ Contains Duplicate, (2) "collect all unique values" โ deduplication, (3) "find the intersection/difference of two collections" โ set operations, (4) "have I visited this state before?" โ BFS/DFS visited tracking.
Analogy: A guest list at a party. When someone arrives, you check the list: if they're on it, they're already in โ no duplicate entry. If they're not, you add them. At any point you can instantly check "is John here?" without scanning the entire list. The list doesn't care about order or how many times John tried to enter โ he's either in the set or he's not.
02
Section Two ยท Mental Model
How It Thinks
HashSet is backed by a HashMap internally โ element โ dummy value (PRESENT)
โถ
All performance characteristics of HashMap apply: O(1) average, O(n) worst case (all hash to same bucket), initial capacity 16, load factor 0.75, resize at 75% full
Two objects are "the same" if a.equals(b) AND a.hashCode() == b.hashCode()
โถ
If you override equals() on a custom class, you MUST also override hashCode() โ otherwise the set can't find the element even if it's logically equal. This is the #1 HashSet bug in interviews.
Adding a duplicate returns false โ the set is unchanged
โถ
You can use the boolean return value to detect duplicates: if (!set.add(x)) // x was already present โ a one-line duplicate check
HashSet has no ordering guarantee โ iteration order is unpredictable
โถ
If you need insertion order โ LinkedHashSet. If you need sorted order โ TreeSet (O(log n) instead of O(1)). For interviews, ordering rarely matters โ use HashSet for speed.
HashSet allows one null element
โถ
null hashes to bucket 0. One null is stored; a second add(null) is a no-op. In interviews, this almost never matters โ but it's a common trivia question.
HashSet vs boolean[] / int[] for membership on a known range
โถ
If elements are in a small, known range [0, N), a boolean[] is faster (no hashing, no boxing). Use HashSet when the range is unknown, large, or elements aren't integers.
03
Section Three ยท Operations
Core Operations
Operation
Method
Average
Worst
Returns
Add
add(e)
O(1)
O(n)
true if added, false if duplicate
Contains
contains(o)
O(1)
O(n)
true / false
Remove
remove(o)
O(1)
O(n)
true if removed, false if absent
Size
size()
O(1)
O(1)
int count
Is empty
isEmpty()
O(1)
O(1)
true / false
Clear
clear()
O(n)
O(n)
void โ removes all
Iterate
for (E e : set)
O(n)
O(n)
each element once (unordered)
Common Patterns
// โโ Duplicate detection in one line โโSet<Integer> seen = newHashSet<>();
for (int x : arr)
if (!seen.add(x)) return true; // duplicate found!// โโ Deduplicate a list โโList<Integer> unique = newArrayList<>(newHashSet<>(list));
// โโ O(1) lookup in two-pass pattern โโSet<Integer> set = newHashSet<>(Arrays.asList(arr));
for (int x : arr2)
if (set.contains(x)) // O(1) instead of O(n) linear search
The add() return value is your secret weapon:
set.add(x) returns false if x was already present โ this is a single-call duplicate check.
No need for a separate contains() call followed by add().
04
Section Four ยท Set vs Map
HashSet vs HashMap โ When to Use Which
HashSet
Membership Only
"Is x in the collection?"
"Collect unique elements"
"Have I visited this node?"
No associated value needed
Set operations: โช โฉ โ
HashMap
Key โ Value Mapping
"How many times does x appear?"
"What is the index of x?"
"Map x to its result"
Need to associate data with each key
Frequency counting, caching
Decision rule:
If you just need YES/NO membership โ HashSet.
If you need to store information WITH each element (count, index, associated value) โ HashMap.
Internally they're the same structure โ a set is a map with dummy values.
05
Section Five ยท Set Operations
Union, Intersection, Difference
Java's Set interface supports bulk operations that correspond to mathematical set operations. These are surprisingly useful in interviews for problems like "find common elements between two arrays" (LC 349) or "find elements in A but not in B."
Math
Operation
Java Method
Result
A โช B
Union
a.addAll(b)
All elements in either set
A โฉ B
Intersection
a.retainAll(b)
Only elements in both sets
A โ B
Difference
a.removeAll(b)
Elements in A but not in B
A โ B
Subset check
b.containsAll(a)
true if all of A is in B
Java โ Set operations
Set<Integer> a = newHashSet<>(Arrays.asList(1, 2, 3, 4));
Set<Integer> b = newHashSet<>(Arrays.asList(3, 4, 5, 6));
// Intersection โ elements in bothSet<Integer> inter = newHashSet<>(a);
inter.retainAll(b); // [3, 4]// Union โ elements in eitherSet<Integer> union = newHashSet<>(a);
union.addAll(b); // [1, 2, 3, 4, 5, 6]// Difference โ in A but not BSet<Integer> diff = newHashSet<>(a);
diff.removeAll(b); // [1, 2]
Common Mistake:retainAll() and removeAll() MODIFY the set in place โ they don't return a new set. Always copy first if you need to preserve the original: new HashSet<>(a).retainAll(b).
Use HashSet unless the problem requires sorted access (floor/ceiling = TreeSet) or insertion-order iteration (LinkedHashSet).
TreeSet is useful for "find nearest element" problems โ it's a balanced BST (Red-Black tree) underneath.
07
Section Seven ยท When to Use
When to Reach for a HashSet
Signal
Pattern
Example
"contains duplicate" / "first repeating"
Add elements, check !set.add(x)
Contains Duplicate (LC 217)
"unique elements" / "remove duplicates"
Collect into set, convert back
Remove duplicates from list
"intersection of two arrays"
retainAll() or check membership
Intersection of Two Arrays (LC 349)
"visited" tracking in BFS/DFS
visited.add(state)
Word Ladder, Number of Islands
"longest consecutive sequence"
Set for O(1) lookup of neighbours
Longest Consecutive Sequence (LC 128)
"happy number" / cycle detection
Detect repeated state
Happy Number (LC 202)
"jewels and stones" / membership check
Build set from one input, scan other
Jewels and Stones (LC 771)
08
Section Eight ยท Implementation
Build It Once
HashSet problems are typically 5โ15 lines. The code is trivial โ the skill is recognising when a set applies. Here are the core patterns:
Java โ HashSet patterns
// 1. Contains Duplicate โ O(n)booleancontainsDuplicate(int[] nums) {
Set<Integer> seen = newHashSet<>();
for (int x : nums)
if (!seen.add(x)) return true;
return false;
}
// 2. Intersection of Two Arrays โ O(n + m)int[] intersection(int[] a, int[] b) {
Set<Integer> setA = newHashSet<>();
for (int x : a) setA.add(x);
Set<Integer> result = newHashSet<>();
for (int x : b)
if (setA.contains(x)) result.add(x);
return result.stream().mapToInt(Integer::intValue).toArray();
}
// 3. Longest Consecutive Sequence โ O(n)intlongestConsecutive(int[] nums) {
Set<Integer> set = newHashSet<>();
for (int x : nums) set.add(x);
int longest = 0;
for (int x : set) {
if (!set.contains(x - 1)) { // start of a sequenceint len = 1;
while (set.contains(x + len)) len++;
longest = Math.max(longest, len);
}
}
return longest;
}
HashSet โ Full Implementation
Java โ HashSet patterns
import java.util.*;
public classHashSetPatterns {
// โโโ 1. Contains Duplicate โโโโโโโโโโโโโโโ O(n)staticbooleancontainsDuplicate(int[] nums) {
Set<Integer> seen = newHashSet<>();
for (int x : nums)
if (!seen.add(x)) return true;
return false;
}
// โโโ 2. Intersection of Two Arrays โโโโโโ O(n + m)staticint[] intersection(int[] a, int[] b) {
Set<Integer> setA = newHashSet<>();
for (int x : a) setA.add(x);
Set<Integer> res = newHashSet<>();
for (int x : b)
if (setA.contains(x)) res.add(x);
return res.stream().mapToInt(Integer::intValue).toArray();
}
// โโโ 3. Longest Consecutive Sequence โโโโ O(n)staticintlongestConsecutive(int[] nums) {
Set<Integer> set = newHashSet<>();
for (int x : nums) set.add(x);
int longest = 0;
for (int x : set) {
if (!set.contains(x - 1)) {
int len = 1;
while (set.contains(x + len)) len++;
longest = Math.max(longest, len);
}
}
return longest;
}
// โโโ 4. Happy Number โโโโโโโโโโโโโโโโโโโ O(log n) per stepstaticbooleanisHappy(int n) {
Set<Integer> seen = newHashSet<>();
while (n != 1 && seen.add(n)) {
int sum = 0;
while (n > 0) {
int d = n % 10;
sum += d * d;
n /= 10;
}
n = sum;
}
return n == 1;
}
// โโโ 5. Set operations โโโโโโโโโโโโโโโโโโโstatic voidsetOps() {
Set<Integer> a = newHashSet<>(Arrays.asList(1,2,3,4));
Set<Integer> b = newHashSet<>(Arrays.asList(3,4,5,6));
Set<Integer> union = newHashSet<>(a);
union.addAll(b); // [1,2,3,4,5,6]Set<Integer> inter = newHashSet<>(a);
inter.retainAll(b); // [3, 4]Set<Integer> diff = newHashSet<>(a);
diff.removeAll(b); // [1, 2]
}
// โโโ 6. Two Sum using HashSet (existence only) โโstaticbooleantwoSumExists(int[] nums, int target) {
Set<Integer> seen = newHashSet<>();
for (int x : nums) {
if (seen.contains(target - x)) return true;
seen.add(x);
}
return false;
}
// โโโ Main โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโpublic static voidmain(String[] args) {
System.out.println(containsDuplicate(newint[]{1,2,3,1})); // trueSystem.out.println(Arrays.toString(
intersection(newint[]{1,2,2,1}, newint[]{2,2}))); // [2]System.out.println(longestConsecutive(
newint[]{100,4,200,1,3,2})); // 4System.out.println(isHappy(19)); // true
}
}
Python โ set patterns
# Python's set is a built-in hash set# โโโ 1. Contains Duplicate โโโโโโโโโโโโโโโ O(n)defcontains_duplicate(nums):
return len(nums) != len(set(nums))
# โโโ 2. Intersection โโโโโโโโโโโโโโโโโโโโโ O(n + m)defintersection(a, b):
return list(set(a) & set(b)) # or set(a).intersection(set(b))# โโโ 3. Longest Consecutive โโโโโโโโโโโโโโ O(n)deflongest_consecutive(nums):
s = set(nums)
longest = 0
for x in s:
if x - 1 not in s: # start of sequence
length = 1
while x + length in s:
length += 1
longest = max(longest, length)
return longest
# โโโ 4. Happy Number โโโโโโโโโโโโโโโโโโโโโdefis_happy(n):
seen = set()
while n != 1 and n not in seen:
seen.add(n)
n = sum(int(d) ** 2 for d in str(n))
return n == 1
# โโโ 5. Set operations (Python built-in) โ
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
a | b # union: 6
a & b # intersection: 4
a - b # difference: 2
a ^ b # symmetric difference: 6
a <= b # subset check: False# โโโ 6. Two Sum existence โโโโโโโโโโโโโโโโ O(n)deftwo_sum_exists(nums, target):
seen = set()
for x in nums:
if target - x in seen:
return True
seen.add(x)
return False# Demo
print(contains_duplicate([1,2,3,1])) # True
print(intersection([1,2,2,1], [2,2])) # [2]
print(longest_consecutive([100,4,200,1,3,2])) # 4
print(is_happy(19)) # True
DeclareSet<Integer> set = new HashSet<>(); โ always use interface type
From arraySet<Integer> set = new HashSet<>(Arrays.asList(arr)); โ or Set.of() for immutable
Capacitynew HashSet<>(capacity) โ pre-size to avoid rehashing if you know the count
Method
Returns
Note
add(e)
boolean โ false if already present
Use return value for duplicate detection
contains(o)
boolean
O(1) average membership test
remove(o)
boolean โ false if not present
O(1) average
size()
int
Number of unique elements
isEmpty()
boolean
size() == 0
addAll(collection)
boolean
Union โ add all elements from another collection
retainAll(collection)
boolean
Intersection โ keep only shared elements (mutates!)
removeAll(collection)
boolean
Difference โ remove all shared elements (mutates!)
containsAll(collection)
boolean
Subset check โ true if all elements of c are in set
โ Gotcha: If you store custom objects in a HashSet, you MUST override both equals() and hashCode(). If two objects are equals() but have different hash codes, the set treats them as different โ you'll get "phantom duplicates." Rule: if a.equals(b) then a.hashCode() == b.hashCode() must hold.
โ Gotcha: Don't modify an element's fields that affect hashCode() while it's in the set. The set stored it at the bucket for the OLD hash โ after mutation, contains() looks in the WRONG bucket and returns false even though the element is physically present. Remove โ modify โ re-add.
โ Gotcha:int[] doesn't override equals()/hashCode() meaningfully โ arrays use identity (reference) comparison. To use arrays as set elements, convert to List<Integer> or String first: set.add(Arrays.toString(arr)).
10
Section Ten ยท Practice
Problems To Solve
HashSet problems are fast to solve โ 5โ10 lines, O(n) time. The skill is recognising that a set applies: whenever you need O(1) membership checks or duplicate detection, a HashSet is the answer. The hardest HashSet problem is Longest Consecutive Sequence (LC 128) โ it requires the insight that you only start counting from sequence beginnings (numbers without a predecessor in the set).
Set for O(1) lookup. For each number, if num-1 is NOT in the set, count consecutive from num upward. O(n) because each number is visited at most twice.