LeetCode ยท #271

Encode and Decode Strings Solution

Design an algorithm to encode a list of strings to a single string, and decode it back. The encoded string is transmitted over a network and decoded back to the original list.

01
Section One ยท Problem

Problem Statement

๐Ÿท๏ธ

Difficulty

Medium

๐Ÿ”—

LeetCode

Problem #271

๐Ÿ—๏ธ

Pattern

Design โ€” length-prefix encoding

Design an algorithm to encode a list of strings into a single string, and decode it back to the original list. The encoded string is transmitted over a network, so the algorithm must handle strings containing any possible character โ€” including delimiters.

Example
Input: ["hello", "world"] Encode: "5#hello5#world" Decode: ["hello", "world"] // Length prefix tells us exactly how many characters to read for each string
Constraints
โ€ข 1 โ‰ค strs.length โ‰ค 200 โ€ข 0 โ‰ค strs[i].length โ‰ค 200 โ€ข strs[i] can contain ANY 256 valid ASCII characters โ†‘ Key constraint โ€” strings can contain ANY character including delimiters
02
Section Two ยท Approach 1

Brute Force โ€” Delimiter-Based

Join strings with a delimiter (e.g., comma or pipe). On decode, split by the delimiter. Simple โ€” but fundamentally broken if any string contains the delimiter character.

Why it fails
Problem: If we use "," as delimiter and a string contains ",", the decoder splits incorrectly. Escaping the delimiter adds complexity and edge cases. No delimiter is safe when strings can contain any ASCII character. We need a fundamentally different approach.
Java โ€” Delimiter (flawed for arbitrary input)
class Codec { public String encode(List<String> strs) { return String.join(",", strs); // broken if strings contain "," } public List<String> decode(String s) { return Arrays.asList(s.split(",")); // splits inside strings too! } }
Python โ€” Delimiter (flawed)
class Codec: def encode(self, strs: list[str]) -> str: return ",".join(strs) # broken if strings contain "," def decode(self, s: str) -> list[str]: return s.split(",") # incorrect splits
MetricValue
TimeO(n) โ€” concatenation and split
CorrectnessโŒ Fails when strings contain the delimiter
03
Section Three ยท Approach 2

Length-Prefix โ€” O(n)

Prefix each string with its length followed by a separator character (e.g., #). The decoder reads the length number, then reads exactly that many characters โ€” no ambiguity possible regardless of string contents.

๐Ÿ’ก Mental model: Imagine sending packages through a tube. Each package has a label on the front saying "this package is 5 meters long." The receiver reads the label, measures out exactly 5 meters, cuts โ€” that's one package. It doesn't matter what's inside (even if it contains other labels) because the length tells you exactly where each package ends.
Algorithm โ€” Length-prefix encoding
  • Encode: For each string, append length + "#" + string. E.g., "hello" โ†’ "5#hello".
  • Decode: Read characters until you hit # โ€” that's the length as a number. Read the next length characters โ€” that's the string. Repeat.
๐ŸŽฏ When to recognize this pattern:
  • Any time you need to serialize/deserialize a collection of variable-length items into a single stream โ€” think length-prefix encoding.
  • The signal is "items can contain any characters including separators." This is how HTTP chunked transfer encoding, Protocol Buffers, and many binary protocols work.
  • The pattern appears in LC 271, serialization interviews, and system design discussions.
Why # as separator?:
  • The separator between the length and the string content could be any character.
  • We choose # because it makes the code readable.
  • The separator is NOT a delimiter between strings โ€” it's only between the length number and the string body.
  • Since we read the length first, we know exactly how many characters to consume, making the # inside string content harmless.
04
Section Four ยท Trace

Visual Walkthrough

Encode and decode ["hello", "wo#rld", ""]:

Length-Prefix Encoding โ€” encode then decode
INPUT LIST ["hello", "wo#rld", ""] ENCODE โ€” prepend length# "hello" โ†’ len=5 โ†’ "5#hello" "wo#rld" โ†’ len=6 โ†’ "6#wo#rld" "" โ†’ len=0 โ†’ "0#" ENCODED STRING: 5#hello6#wo#rld0# DECODE โ€” read length, then read that many chars pos=0: read "5" โ†’ # at pos 1 โ†’ read 5 chars "hello" โ†’ pos=7 pos=7: read "6" โ†’ # at pos 8 โ†’ read 6 chars "wo#rld" โ†’ pos=15 pos=15: read "0" โ†’ # at pos 16 โ†’ read 0 chars "" โ†’ pos=17 โ†’ DONE โ†‘ The # inside "wo#rld" is safely consumed as content! Decoded: ["hello", "wo#rld", ""] โœ“
Notice:
  • The string "wo#rld" contains the separator character #.
  • The decoder doesn't care โ€” it reads the length (6) first, then blindly consumes the next 6 characters.
  • The # inside the content is just another character.
05
Section Five ยท Implementation

Code โ€” Java & Python

Java โ€” Length-Prefix
import java.util.*; class Codec { public String encode(List<String> strs) { StringBuilder sb = new StringBuilder(); for (String s : strs) sb.append(s.length()).append('#').append(s); return sb.toString(); } public List<String> decode(String s) { List<String> result = new ArrayList<>(); int i = 0; while (i < s.length()) { int j = s.indexOf('#', i); // find the separator int len = Integer.parseInt(s.substring(i, j)); result.add(s.substring(j + 1, j + 1 + len)); i = j + 1 + len; // advance past this string } return result; } }
Python โ€” Length-Prefix
class Codec: def encode(self, strs: list[str]) -> str: return "".join(f"{len(s)}#{s}" for s in strs) def decode(self, s: str) -> list[str]: result = [] i = 0 while i < len(s): j = s.index("#", i) # find separator length = int(s[i:j]) result.append(s[j + 1 : j + 1 + length]) i = j + 1 + length return result
06
Section Six ยท Analysis

Complexity Analysis

ApproachTimeSpaceTrade-off
Delimiter (broken)O(n)O(n)Fast but incorrect for arbitrary input
Escaping + DelimiterO(n)O(n)Correct but complex escape logic
Length-Prefix โ† optimal O(n) O(n) Simple, correct for all inputs, no ambiguity
Where n = total characters:
  • Both encode and decode process each character exactly once.
  • The length prefix adds a constant overhead per string (a few digits + one #), which is negligible compared to the string content.
07
Section Seven ยท Edge Cases

Edge Cases & Pitfalls

CaseInputWhy It Matters
Empty string in list["", "a"]Encodes as "0#1#a". Length 0 means read 0 chars โ€” produces "".
All empty strings["", "", ""]Encodes as "0#0#0#". Three reads of length 0.
String contains #["a#b"]Encodes as "3#a#b". Decoder reads len=3, consumes "a#b" โ€” correct.
String contains digits["123"]Encodes as "3#123". Decoder finds first # at pos 1, reads len=3.
Single string["hello"]Encodes as "5#hello". Simplest case โ€” one length prefix, one string.
Long strings["a"ร—200]Length prefix is "200" โ€” 3 digits. Works fine; number of digits scales with log(len).
โš  Common Mistake: Using indexOf('#') starting from position 0 instead of position i. If a previous string's content contains #, seeking from the wrong position finds the wrong separator. Always search for # starting from the current parsing position i.

โ† Back to Arrays & Hashing problems