LeetCode · #271

Encode and Decode Strings Solution

Design an algorithm to encode a list of strings to a single string, and decode it back. The encoded string is transmitted over a network and decoded back to the original list.

Section One · Problem

Problem Statement

🏷️

Difficulty

Medium

🔗

LeetCode

Problem #271

🏗️

Pattern

Design — length-prefix encoding

Design an algorithm to encode a list of strings into a single string, and decode it back to the original list. The encoded string is transmitted over a network, so the algorithm must handle strings containing any possible character — including delimiters.

Example

 Input: ["hello", "world"]
Encode: "5#hello5#world" Decode: ["hello", "world"]
// Length prefix tells us exactly how many characters to read for each string 

Constraints

 • 1 ≤ strs.length ≤ 200 • 0 ≤ strs[i].length ≤ 200 • strs[i] can contain ANY 256 valid ASCII characters  ↑ Key constraint — strings can contain ANY character including delimiters 

Section Two · Approach 1

Brute Force — Delimiter-Based

Join strings with a delimiter (e.g., comma or pipe). On decode, split by the delimiter. Simple — but fundamentally broken if any string contains the delimiter character.

Why it fails

Problem: If we use "," as delimiter and a string contains ",", the decoder splits incorrectly. Escaping the delimiter adds complexity and edge cases. No delimiter is safe when strings can contain any ASCII character. We need a fundamentally different approach.

Java — Delimiter (flawed for arbitrary input)

 class Codec {
public String encode(List<String> strs) {
return String.join(",", strs); // broken if strings contain ","
}
public List<String> decode(String s) {
return Arrays.asList(s.split(",")); // splits inside strings too!
}
}

Python — Delimiter (flawed)

 class Codec:
def encode(self, strs: list[str]) -> str:
return ",".join(strs) # broken if strings contain "," def decode(self, s: str) -> list[str]:
return s.split(",") # incorrect splits 

Metric	Value
Time	O(n) — concatenation and split
Correctness	❌ Fails when strings contain the delimiter

Section Three · Approach 2

Length-Prefix — O(n)

Prefix each string with its length followed by a separator character (e.g., #). The decoder reads the length number, then reads exactly that many characters — no ambiguity possible regardless of string contents.

💡 Mental model: Imagine sending packages through a tube. Each package has a label on the front saying "this package is 5 meters long." The receiver reads the label, measures out exactly 5 meters, cuts — that's one package. It doesn't matter what's inside (even if it contains other labels) because the length tells you exactly where each package ends.

Algorithm — Length-prefix encoding

Encode: For each string, append length + "#" + string. E.g., "hello" → "5#hello".
Decode: Read characters until you hit # — that's the length as a number. Read the next length characters — that's the string. Repeat.

🎯 When to recognize this pattern:

Any time you need to serialize/deserialize a collection of variable-length items into a single stream — think length-prefix encoding.
The signal is "items can contain any characters including separators." This is how HTTP chunked transfer encoding, Protocol Buffers, and many binary protocols work.
The pattern appears in LC 271, serialization interviews, and system design discussions.

Why # as separator?:

The separator between the length and the string content could be any character.
We choose # because it makes the code readable.
The separator is NOT a delimiter between strings — it's only between the length number and the string body.
Since we read the length first, we know exactly how many characters to consume, making the # inside string content harmless.

Section Four · Trace

Visual Walkthrough

Encode and decode ["hello", "wo#rld", ""]:

Length-Prefix Encoding — encode then decode

Notice:

The string "wo#rld" contains the separator character #.
The decoder doesn't care — it reads the length (6) first, then blindly consumes the next 6 characters.
The # inside the content is just another character.

Section Five · Implementation

Code — Java & Python

Java — Length-Prefix

 import java.util.*;
class Codec {
public String encode(List<String> strs) {
StringBuilder sb = new StringBuilder();
for (String s : strs)
            sb.append(s.length()).append('#').append(s);
return sb.toString();
    }
public List<String> decode(String s) {
List<String> result = new ArrayList<>();
int i = 0;
while (i < s.length()) {
int j = s.indexOf('#', i); // find the separator int len = Integer.parseInt(s.substring(i, j));
            result.add(s.substring(j + 1, j + 1 + len));
            i = j + 1 + len; // advance past this string
}
return result;
    }
}

Python — Length-Prefix

 class Codec:
def encode(self, strs: list[str]) -> str:
return "".join(f"{len(s)}#{s}" for s in strs)
def decode(self, s: str) -> list[str]:
        result = []
        i = 0 while i < len(s):
            j = s.index("#", i) # find separator
length = int(s[i:j])
            result.append(s[j + 1 : j + 1 + length])
            i = j + 1 + length
return result

Section Six · Analysis

Complexity Analysis

Approach	Time	Space	Trade-off
Delimiter (broken)	O(n)	O(n)	Fast but incorrect for arbitrary input
Escaping + Delimiter	O(n)	O(n)	Correct but complex escape logic
Length-Prefix ← optimal	O(n)	O(n)	Simple, correct for all inputs, no ambiguity

Where n = total characters:

Both encode and decode process each character exactly once.
The length prefix adds a constant overhead per string (a few digits + one #), which is negligible compared to the string content.

Section Seven · Edge Cases

Edge Cases & Pitfalls

Case	Input	Why It Matters
Empty string in list	`["", "a"]`	Encodes as `"0#1#a"`. Length 0 means read 0 chars — produces "".
All empty strings	`["", "", ""]`	Encodes as `"0#0#0#"`. Three reads of length 0.
String contains #	`["a#b"]`	Encodes as `"3#a#b"`. Decoder reads len=3, consumes "a#b" — correct.
String contains digits	`["123"]`	Encodes as `"3#123"`. Decoder finds first # at pos 1, reads len=3.
Single string	`["hello"]`	Encodes as `"5#hello"`. Simplest case — one length prefix, one string.
Long strings	`["a"×200]`	Length prefix is "200" — 3 digits. Works fine; number of digits scales with log(len).

⚠ Common Mistake: Using indexOf('#') starting from position 0 instead of position i. If a previous string's content contains #, seeking from the wrong position finds the wrong separator. Always search for # starting from the current parsing position i.

← Back to Arrays & Hashing problems

LearningTree

LC 271 · Encode and Decode Strings — Solution & Explanation | DSA Guide