System Design · Building Blocks · Message Queues & Streaming

Message Queues & Streaming

Decoupling systems through asynchronous communication.

Chapter One

What Message Queues and Streams Are

The Cost of Synchronous Calls

Picture an order checkout. The user clicks “Place Order.” The web server has to charge the card, deduct inventory, send a confirmation email, notify the warehouse, update the analytics pipeline, and trigger fraud screening. If all of those happen synchronously inside the request, the user waits for the slowest one and any single failure aborts the whole order. Message queues exist to break that chain. The web server records the order, drops a message on a queue, and returns success in milliseconds. Every other system processes the message at its own pace, retries on its own failures, and never blocks the user.

Two adjacent concepts share this space and are routinely confused. A queue is a one-shot pipe: a producer writes a message, a consumer reads it once, the message is gone. A stream is a durable, replayable log: every consumer can read the same messages independently, at their own pace, and rewind if they need to. Both decouple producers from consumers. They optimise for very different things.

📬

Queue — Once and Done

Model: a producer enqueues a message; one consumer dequeues it; the message is removed.

Strength: work distribution, task processing, fan-out via routing.

Examples: RabbitMQ, AWS SQS, Redis lists, ActiveMQ.

Use: “send this email,” “process this image,” “refund this charge.”

📜

Stream — The Replayable Log

Model: messages appended to a partitioned, durable log; multiple consumer groups read independently.

Strength: event-driven architectures, replay, multi-consumer fan-out, audit history.

Examples: Apache Kafka, AWS Kinesis, Apache Pulsar, Redpanda.

Use: “every order placed,” “every page view,” “every state change.”

Synchronous vs Asynchronous — The Decoupling Pattern

The analogy that works: a queue is a paper ticket dispenser at a deli counter — one ticket, one customer, one server. A stream is a security camera tape — the events are recorded, anyone can rewind and watch, and the recording outlives any single viewer. Both let the store keep running even if the cashier steps away for a minute.

The single biggest architectural lever in distributed systems: replace synchronous calls with asynchronous messages. Latency drops, failure becomes recoverable, services stop knowing about each other. Half of the distributed-systems patterns in this book exist because someone removed a synchronous call.

📋 Chapter 1 — Summary

Message queues and streams decouple producers from consumers — cutting latency and isolating failures.
Queue: one consumer, message removed after read — for tasks and work distribution.
Stream: durable replayable log, multiple independent consumers — for events and audit trails.
Async decoupling is the single biggest architectural lever in distributed systems.

Chapter Two

How They Work Internally

Kafka — The Distributed Append-Only Log

Kafka is not a queue. It is a distributed, partitioned, append-only log with consumer-managed offsets. That single sentence explains nearly everything about its behaviour. A topic is a logical channel; each topic is split into partitions; each partition is an ordered log file replicated across brokers. Producers append to a partition; consumers track their own position (the offset) within each partition; messages are not deleted when consumed — they age out by retention policy (e.g. 7 days, or once a topic reaches 100 GB). This is why Kafka can support replay and multiple independent consumers: nothing is destroyed when read.

Kafka Topic — Partitions, Offsets, Consumer Groups

Two more concepts complete the model. Consumer groups: a group is a set of consumers that share the work of reading a topic; each partition is read by exactly one consumer in the group. Add a consumer, partitions rebalance. Multiple groups read the same topic independently — that's how analytics, notifications, and audit can each consume the same events without coordination. Log compaction: instead of dropping old messages by time, retain only the latest message per key — perfect for materialising state (“current value of every account”) into a topic.

RabbitMQ — Routing-First Messaging

RabbitMQ implements AMQP and takes a fundamentally different shape. Producers don't write to queues directly — they publish to exchanges. The exchange uses routing keys and bindings to decide which queues receive each message. Direct exchanges (exact match), topic exchanges (wildcard match), fanout (every queue gets a copy), and headers exchanges give you flexible routing without producers knowing about consumers. Once a message lands in a queue, it's removed when acknowledged — classic queue semantics, no replay.

The mental model split: Kafka thinks in events (here's what happened, anyone can read it forever); RabbitMQ thinks in tasks (here's work to do, route it to the right worker). Both are correct for their respective problems. Picking the wrong one creates ongoing pain.

📚

Kafka Mental Model

Primitive: partitioned append-only log.

State: consumer-managed offsets.

Retention: time-based (7 days) or size-based.

Routing: partition key (hash decides partition).

Strength: replay, multi-consumer, event sourcing, audit trails.

🔁

RabbitMQ Mental Model

Primitive: queues fed by exchanges.

State: broker tracks ack/nack.

Retention: until consumed and acked.

Routing: exchange + routing key + bindings.

Strength: task distribution, complex routing, RPC patterns, priority queues.

The single biggest source of confusion: people use Kafka where they want a queue, then complain that “Kafka loses messages” (it doesn't — their consumer commits offsets without finishing work). And people use RabbitMQ where they want a stream, then complain there's no replay. The tool is rarely the problem. The mental model usually is.

📋 Chapter 2 — Summary

Kafka: distributed append-only log; topics → partitions; consumers track offsets; replay is free.
Consumer groups divide partitions among consumers; multiple groups read the same topic independently.
Log compaction retains the latest message per key — materialised state in a topic.
RabbitMQ: exchanges + bindings + queues; rich routing; classic ack/remove semantics, no replay.

Chapter Three

When to Use — and When Not To

Async Is Not Free — Choose Carefully

Asynchronous messaging is not a free win. It introduces eventual consistency, additional infrastructure, harder debugging, and a whole class of new failure modes (poison pills, redelivery, ordering across partitions). The real question is not “should we use a queue?” but “is this work tolerant of being delayed by seconds, minutes, or occasionally hours, in exchange for resilience and decoupling?”

✅

USE Async Messaging When…

The work can wait. Email confirmations, indexing, analytics ingestion, fan-out notifications.

Producers and consumers scale independently. 100 ingest API replicas feeding 5 batch processors.

You need durability under load spikes. Buffer the spike in the queue; consumers drain it at their own pace.

You need fan-out. One event, many independent consumers.

You need replay or audit history. Streams give you both for free.

⛔

DO NOT Use Async When…

The user needs an answer right now. Authentication, payment authorization, search results — user is waiting.

Strong serialisability is required. Bank ledger entries that must be totally ordered — queues won't guarantee this without careful design.

You introduced the queue to “decouple things” with no specific reason. Async adds operational cost; it must pay for itself.

Volume is tiny and producers / consumers always run together. A function call is simpler.

Queue vs Stream — Choosing Between Them

📬

Pick a Queue (RabbitMQ / SQS)

Each message has exactly one rightful processor. You need rich routing patterns (topic, fanout, headers). You don't need replay. You want priority queues, delayed delivery, or RPC-style request/reply.

📜

Pick a Stream (Kafka / Kinesis)

Multiple independent consumers need the same data. You want replay for backfills or new consumers. Volume is high and order matters within a key. You're building event-driven architecture.

☁️

Pick a Cloud Queue (SQS)

You want zero ops. Throughput is moderate (< 10k msg/sec per queue) and you can live without strong ordering. Pay-per-use beats a managed RabbitMQ for many use cases.

Async messaging is a powerful tool that extracts a real cost — debugging, ordering, exactly-once illusions, monitoring. Use it where the cost is justified by resilience and throughput. Don't use it because it sounds modern. Half of the worst microservice architectures I've seen are tangles of queues that should have been function calls.

📋 Chapter 3 — Summary

Use async when work can wait, producers/consumers scale independently, or you need fan-out, replay, or buffer for spikes.
Avoid async when the user is waiting on the result, or when total ordering is required.
Queues for tasks; streams for events; cloud queues when ops are the bottleneck.
Async is not free — the operational cost must be justified by the resilience or throughput it buys.

Chapter Four

Trade-offs & Comparisons

Delivery Semantics — Three Honest Choices

Every messaging system promises one of three delivery guarantees. The honest names are at-most-once, at-least-once, and effectively-once. Notice the absence of “exactly once” — in any system where the network can fail and processes can crash, true exactly-once delivery is impossible. What modern systems provide is at-least-once delivery plus mechanisms (idempotent consumers, transactional offsets) that make duplicates harmless — effectively-once from the application's perspective.

❌

At-Most-Once

Semantics: message may be lost; never duplicated.

Use: high-volume telemetry, metrics, logs where loss of 0.01% is acceptable.

Implementation: commit offset before processing.

Risk: crash between offset-commit and processing → lost message.

♻️

At-Least-Once

Semantics: message always delivered; may be duplicated.

Use: default for almost everything — combined with idempotent consumers.

Implementation: commit offset after processing succeeds.

Risk: crash after processing, before offset-commit → replayed on restart.

🎯

Effectively-Once

Semantics: at-least-once + idempotent processing or transactional offsets.

Use: payment processing, financial events, anything where duplicates are dangerous.

Implementation: Kafka transactions with EOS, idempotent producer + consumer dedup.

Cost: latency, throughput, and complexity.

Backpressure — What Happens When Consumers Can't Keep Up

A queue that fills faster than consumers drain it is a queue with a problem. There are four ways to deal with it, and the wrong choice cascades into outages. Buffer: let the queue grow — works for short bursts, fails when sustained. Block the producer: producer slows down to match consumer rate — honest but pushes pressure upstream. Drop messages: shed load deliberately — only acceptable for low-value data. Auto-scale consumers: add more consumers under load — the modern default in cloud environments. Consumer lag — the difference between “newest message” and “last consumed message” — is the metric that tells you which strategy is working. Page on lag, not on queue depth alone.

Kafka vs RabbitMQ vs SQS — The Decision in Practice

📋

Apache Kafka

Throughput: millions of msg/sec per cluster.

Durability: replicated log on disk; configurable retention.

Ordering: per-partition strict order.

Cost: ZooKeeper / KRaft, brokers, ops effort. Worth it at scale, overkill below it.

Use: event sourcing, CDC, real-time analytics, microservice event bus.

🐇

RabbitMQ

Throughput: tens of thousands msg/sec.

Durability: durable queues + persistent messages.

Ordering: per-queue, but parallel consumers can reorder.

Cost: moderate ops effort, simpler than Kafka.

Use: task queues, RPC, complex routing, priority work.

☁️

AWS SQS

Throughput: standard unlimited; FIFO 3000 msg/sec per group.

Durability: managed, multi-AZ, no ops.

Ordering: FIFO queues only; standard is best-effort.

Cost: pay-per-use, zero ops.

Use: background jobs, decoupling AWS Lambda, simple work distribution.

Most teams should start with the boring, managed cloud option (SQS / Cloud Pub-Sub / Service Bus). Move to Kafka when throughput, replay, or multi-consumer fan-out genuinely requires it. Move to RabbitMQ when you need rich routing patterns. Don't pick Kafka because it's on every architecture diagram — pick it because the alternative is breaking.

📋 Chapter 4 — Summary

Three delivery semantics: at-most-once (loss OK), at-least-once (duplicates handled), effectively-once (idempotent or transactional).
Backpressure options: buffer, block producer, drop, auto-scale — monitor consumer lag, not just queue depth.
Kafka for scale + replay, RabbitMQ for routing, SQS for “just make it go away.”
Pick the simpler option until you can show it's the bottleneck.

Chapter Five

Production Patterns & Common Mistakes

Two Patterns That Save Production Systems

If you take only two practices into production from this chapter, take these. Idempotent consumers: design every consumer so processing the same message twice produces the same result as processing it once. The pattern is usually a unique message ID stored in a database with a UNIQUE constraint, or a consumer-side dedup table with TTL. The cost is minimal; the upside is that at-least-once delivery becomes effectively-once at the application level. Dead letter queues (DLQ): when a message fails repeatedly (e.g. 5 retries), don't loop forever — route it to a separate queue for human inspection. The DLQ is your safety net for poison pills, schema mismatches, and bugs that only show up on certain payloads.

♻️

Pattern: Idempotent Consumer

Goal: processing the same message N times = processing it once.

Implementations: store processed message_id in DB with UNIQUE constraint; check-then-act inside a transaction; deterministic upserts that overwrite identical state.

Watch: idempotency must extend to side effects — charging a card twice is not idempotent without a request_id.

📬

Pattern: Dead Letter Queue

Goal: isolate poison messages so they don't block the queue forever.

Implementation: after N failed processing attempts (e.g. 5), move to a DLQ; alert; inspect; replay or discard manually.

Watch: a growing DLQ is a real incident, not a queue setting — alert on it.

The Five Mistakes That Bring Down Async Systems

❌

Mistake 1 — No Dead Letter Queue

A single bad message crashes the consumer; redelivered; crashes again; queue fills; entire pipeline stops. Fix: DLQ after N retries; alert when DLQ is non-empty; stop messages that can never succeed from blocking ones that can.

📉

Mistake 2 — No Consumer Lag Monitoring

Queue silently grows for hours before anyone notices. Fix: alert on consumer lag (Kafka), queue depth + age (SQS/RabbitMQ). Set thresholds before they matter.

💥

Mistake 3 — Non-Idempotent Consumers

At-least-once redelivery causes duplicate side effects: double charges, double emails, double inventory deductions. Fix: dedup on message_id; idempotent state transitions; UNIQUE constraints in storage.

📝

Mistake 4 — Schema Drift

Producer adds a field; old consumers crash on parsing; pipeline halts. Fix: Schema Registry (Avro/Protobuf); enforce backward compatibility; deploy producers and consumers in versioned waves.

🔁

Mistake 5 — Sync-in-Async Anti-pattern

Service A enqueues a job, then polls a different queue waiting for the result. You reinvented synchronous calls with extra latency. Fix: if the user needs an answer now, call directly. Async is for work that can wait.

🔁

Bonus — Ordering Assumptions Across Partitions

Code assumes messages arrive in produce order — but Kafka only guarantees order within a partition. Fix: partition by the key whose order you care about (user_id, account_id); never assume cross-partition order.

Almost every async-system outage I have debugged comes from one of these. The system is rarely the issue; the assumptions consumers make about it are. Treat at-least-once as a contract, not a bug, and design accordingly.

📋 Chapter 5 — Summary

Idempotent consumers turn at-least-once delivery into effectively-once at the application level.
Dead letter queues isolate poison messages and prevent pipeline-wide stalls.
Monitor consumer lag, not just queue depth — lag tells you whether you're keeping up.
The five outage mistakes: no DLQ, no lag monitoring, non-idempotent consumers, schema drift, sync-in-async.

← Replication & Sharding CDN & Object Storage →