HashSet | DSA Guide | LearningTree

01

Section One · Foundation

What is a HashSet?

A HashSet stores unique elements with O(1) average time for add, contains, and remove. Internally, it's backed by a HashMap where elements are keys and values are a dummy constant — so all the hashing, bucket, and collision mechanics from HashMap apply here. The difference is the interface: a HashMap maps keys to values; a HashSet just answers "is this element in the set?" A HashSet does NOT maintain insertion order (use LinkedHashSet for that), does NOT sort elements (use TreeSet for that), and does NOT allow duplicates — adding an existing element is a no-op. In interviews, HashSet is your first reach for: (1) "does this element appear again?" — Contains Duplicate, (2) "collect all unique values" — deduplication, (3) "find the intersection/difference of two collections" — set operations, (4) "have I visited this state before?" — BFS/DFS visited tracking.

Analogy: A guest list at a party. When someone arrives, you check the list: if they're on it, they're already in — no duplicate entry. If they're not, you add them. At any point you can instantly check "is John here?" without scanning the entire list. The list doesn't care about order or how many times John tried to enter — he's either in the set or he's not.

02

Section Two · Mental Model

How It Thinks

HashSet is backed by a HashMap internally — element → dummy value (PRESENT)

▶

All performance characteristics of HashMap apply: O(1) average, O(n) worst case (all hash to same bucket), initial capacity 16, load factor 0.75, resize at 75% full

Two objects are "the same" if a.equals(b) AND a.hashCode() == b.hashCode()

▶

If you override equals() on a custom class, you MUST also override hashCode() — otherwise the set can't find the element even if it's logically equal. This is the #1 HashSet bug in interviews.

Adding a duplicate returns false — the set is unchanged

▶

You can use the boolean return value to detect duplicates: if (!set.add(x)) // x was already present — a one-line duplicate check

HashSet has no ordering guarantee — iteration order is unpredictable

▶

If you need insertion order → LinkedHashSet. If you need sorted order → TreeSet (O(log n) instead of O(1)). For interviews, ordering rarely matters — use HashSet for speed.

HashSet allows one null element

▶

null hashes to bucket 0. One null is stored; a second add(null) is a no-op. In interviews, this almost never matters — but it's a common trivia question.

HashSet vs boolean[] / int[] for membership on a known range

▶

If elements are in a small, known range [0, N), a boolean[] is faster (no hashing, no boxing). Use HashSet when the range is unknown, large, or elements aren't integers.

03

Section Three · Operations

Core Operations

Operation	Method	Average	Worst	Returns
Add	`add(e)`	O(1)	O(n)	`true` if added, `false` if duplicate
Contains	`contains(o)`	O(1)	O(n)	`true` / `false`
Remove	`remove(o)`	O(1)	O(n)	`true` if removed, `false` if absent
Size	`size()`	O(1)	O(1)	int count
Is empty	`isEmpty()`	O(1)	O(1)	`true` / `false`
Clear	`clear()`	O(n)	O(n)	void — removes all
Iterate	`for (E e : set)`	O(n)	O(n)	each element once (unordered)

Common Patterns

 // ── Duplicate detection in one line ── Set<Integer> seen = new HashSet<>();
for (int x : arr)
if (!seen.add(x)) return true; // duplicate found! // ── Deduplicate a list ── List<Integer> unique = new ArrayList<>(new HashSet<>(list));
// ── O(1) lookup in two-pass pattern ── Set<Integer> set = new HashSet<>(Arrays.asList(arr));
for (int x : arr2)
if (set.contains(x)) // O(1) instead of O(n) linear search 

The add() return value is your secret weapon:

set.add(x) returns false if x was already present — this is a single-call duplicate check.
No need for a separate contains() call followed by add().

04

Section Four · Set vs Map

HashSet vs HashMap — When to Use Which

HashSet

Membership Only

"Is x in the collection?"
"Collect unique elements"
"Have I visited this node?"
No associated value needed
Set operations: ∪ ∩ ∖

HashMap

Key → Value Mapping

"How many times does x appear?"
"What is the index of x?"
"Map x to its result"
Need to associate data with each key
Frequency counting, caching

Decision rule:

If you just need YES/NO membership → HashSet.
If you need to store information WITH each element (count, index, associated value) → HashMap.
Internally they're the same structure — a set is a map with dummy values.

05

Section Five · Set Operations

Union, Intersection, Difference

Java's Set interface supports bulk operations that correspond to mathematical set operations. These are surprisingly useful in interviews for problems like "find common elements between two arrays" (LC 349) or "find elements in A but not in B."

Math	Operation	Java Method	Result
A ∪ B	Union	`a.addAll(b)`	All elements in either set
A ∩ B	Intersection	`a.retainAll(b)`	Only elements in both sets
A ∖ B	Difference	`a.removeAll(b)`	Elements in A but not in B
A ⊆ B	Subset check	`b.containsAll(a)`	true if all of A is in B

Java — Set operations

 Set<Integer> a = new HashSet<>(Arrays.asList(1, 2, 3, 4));
Set<Integer> b = new HashSet<>(Arrays.asList(3, 4, 5, 6));
// Intersection — elements in both Set<Integer> inter = new HashSet<>(a);
inter.retainAll(b); // [3, 4] // Union — elements in either Set<Integer> union = new HashSet<>(a);
union.addAll(b); // [1, 2, 3, 4, 5, 6] // Difference — in A but not B Set<Integer> diff = new HashSet<>(a);
diff.removeAll(b); // [1, 2] 

Common Mistake: retainAll() and removeAll() MODIFY the set in place — they don't return a new set. Always copy first if you need to preserve the original: new HashSet<>(a).retainAll(b).

06

Section Six · Variants

HashSet vs LinkedHashSet vs TreeSet

Implementation	Order	add/contains/remove	Use when
HashSet	None (unpredictable)	O(1) average	Default — fastest, order doesn't matter
LinkedHashSet	Insertion order	O(1) average	Need iteration in insertion order (LRU cache)
TreeSet	Sorted (natural or Comparator)	O(log n)	Need sorted iteration, floor/ceiling queries

TreeSet extras — sorted set operations

 TreeSet<Integer> ts = new TreeSet<>(Arrays.asList(1, 3, 5, 7, 9));
ts.floor(4); // 3 — largest ≤ 4
ts.ceiling(4); // 5 — smallest ≥ 4
ts.lower(5); // 3 — largest < 5
ts.higher(5); // 7 — smallest > 5
ts.subSet(3, 8); // [3, 5, 7] — range view 

Interview default:

Use HashSet unless the problem requires sorted access (floor/ceiling = TreeSet) or insertion-order iteration (LinkedHashSet).
TreeSet is useful for "find nearest element" problems — it's a balanced BST (Red-Black tree) underneath.

07

Section Seven · When to Use

When to Reach for a HashSet

Signal	Pattern	Example
"contains duplicate" / "first repeating"	Add elements, check `!set.add(x)`	Contains Duplicate (LC 217)
"unique elements" / "remove duplicates"	Collect into set, convert back	Remove duplicates from list
"intersection of two arrays"	`retainAll()` or check membership	Intersection of Two Arrays (LC 349)
"visited" tracking in BFS/DFS	`visited.add(state)`	Word Ladder, Number of Islands
"longest consecutive sequence"	Set for O(1) lookup of neighbours	Longest Consecutive Sequence (LC 128)
"happy number" / cycle detection	Detect repeated state	Happy Number (LC 202)
"jewels and stones" / membership check	Build set from one input, scan other	Jewels and Stones (LC 771)

08

Section Eight · Implementation

Build It Once

HashSet problems are typically 5–15 lines. The code is trivial — the skill is recognising when a set applies. Here are the core patterns:

Java — HashSet patterns

 // 1. Contains Duplicate — O(n) boolean containsDuplicate(int[] nums) {
Set<Integer> seen = new HashSet<>();
for (int x : nums)
if (!seen.add(x)) return true;
return false;
}
// 2. Intersection of Two Arrays — O(n + m) int[] intersection(int[] a, int[] b) {
Set<Integer> setA = new HashSet<>();
for (int x : a) setA.add(x);
Set<Integer> result = new HashSet<>();
for (int x : b)
if (setA.contains(x)) result.add(x);
return result.stream().mapToInt(Integer::intValue).toArray();
}
// 3. Longest Consecutive Sequence — O(n) int longestConsecutive(int[] nums) {
Set<Integer> set = new HashSet<>();
for (int x : nums) set.add(x);
int longest = 0;
for (int x : set) {
if (!set.contains(x - 1)) { // start of a sequence int len = 1;
while (set.contains(x + len)) len++;
            longest = Math.max(longest, len);
        }
    }
return longest;
}

09

Section Nine · Java Reference

Use It In Java

IN JAVA — java.util.HashSet<E>

Import import java.util.HashSet; / import java.util.Set;

Declare Set<Integer> set = new HashSet<>(); — always use interface type

From array Set<Integer> set = new HashSet<>(Arrays.asList(arr)); — or Set.of() for immutable

Capacity new HashSet<>(capacity) — pre-size to avoid rehashing if you know the count

Method	Returns	Note
`add(e)`	`boolean` — false if already present	Use return value for duplicate detection
`contains(o)`	`boolean`	O(1) average membership test
`remove(o)`	`boolean` — false if not present	O(1) average
`size()`	`int`	Number of unique elements
`isEmpty()`	`boolean`	size() == 0
`addAll(collection)`	`boolean`	Union — add all elements from another collection
`retainAll(collection)`	`boolean`	Intersection — keep only shared elements (mutates!)
`removeAll(collection)`	`boolean`	Difference — remove all shared elements (mutates!)
`containsAll(collection)`	`boolean`	Subset check — true if all elements of c are in set

⚠ Gotcha: If you store custom objects in a HashSet, you MUST override both equals() and hashCode(). If two objects are equals() but have different hash codes, the set treats them as different — you'll get "phantom duplicates." Rule: if a.equals(b) then a.hashCode() == b.hashCode() must hold.

⚠ Gotcha: Don't modify an element's fields that affect hashCode() while it's in the set. The set stored it at the bucket for the OLD hash — after mutation, contains() looks in the WRONG bucket and returns false even though the element is physically present. Remove → modify → re-add.

⚠ Gotcha: int[] doesn't override equals()/hashCode() meaningfully — arrays use identity (reference) comparison. To use arrays as set elements, convert to List<Integer> or String first: set.add(Arrays.toString(arr)).

10

Section Ten · Practice

Problems To Solve

HashSet problems are fast to solve — 5–10 lines, O(n) time. The skill is recognising that a set applies: whenever you need O(1) membership checks or duplicate detection, a HashSet is the answer. The hardest HashSet problem is Longest Consecutive Sequence (LC 128) — it requires the insight that you only start counting from sequence beginnings (numbers without a predecessor in the set).

Difficulty	Pattern	Problem	Key Insight
Easy	membership	Contains Duplicate — LC 217	Add each element. If `add()` returns false, a duplicate exists. One-liner: `return nums.length != new HashSet<>(List.of(nums)).size()`.
Easy	set-ops	Intersection of Two Arrays — LC 349	Build set from one array, check membership against the other. Use `retainAll()` or manual loop.
Easy	cycle	Happy Number — LC 202	Detect cycle in the digit-square sequence: if a number appears twice, it's not happy. Set tracks seen numbers.
Medium	set	Longest Consecutive Sequence — LC 128	Set for O(1) lookup. For each number, if `num-1` is NOT in the set, count consecutive from num upward. O(n) because each number is visited at most twice.
Medium	visited	Word Ladder — LC 127	BFS with a HashSet for the word dictionary (O(1) lookup) and a visited set to avoid revisiting. Transform one letter at a time.
Medium	constraint-check	Valid Sudoku — LC 36	Use sets to track seen digits per row, column, and 3×3 box. Any repeated digit violates the board immediately.

Interview Pattern:

Whenever you find yourself doing a linear scan to check "have I seen this before?" or "is this in the other list?" — replace it with a HashSet.
The upgrade from O(n) per check to O(1) per check is the difference between O(n²) and O(n) total.

→ See the full HashSet practice set