System Design Β· Communication & APIs

Communication & APIs

How components talk to each other and to the outside world.

01
Chapter One

Synchronous vs Asynchronous Communication

The Fundamental Communication Choice

A service that calls another service synchronously is only as reliable as that dependency. If the payment service times out, the checkout service times out. If the email service is slow, the registration endpoint is slow. Synchronous coupling is invisible until a dependency fails β€” and then it is catastrophically visible. Asynchronous communication breaks that coupling at the cost of complexity. The question every system designer must answer is deceptively simple: does the caller need the result right now, or just eventually?

Synchronous
Asynchronous
  • Caller blocks waiting for response
  • Both parties must be available simultaneously
  • Failure in dependency propagates to caller immediately
  • Total latency = sum of all dependency latencies
  • Simple to reason about β€” linear call trace
  • Thread/connection held for full duration of call
  • Use when: result needed to continue processing
  • Caller fires message and continues immediately
  • Temporal decoupling β€” parties need not be available together
  • Failure in dependency: message queued, not lost
  • Caller latency decoupled from dependency latency
  • Complex to debug β€” requires distributed traces
  • Requires idempotent message handlers
  • Use when: result not needed immediately by caller
Communication Timeline β€” Synchronous vs Asynchronous
Synchronous Client Service request BLOCKED proc. response continues Dependency slow β†’ caller slow Dependency fails β†’ caller fails Asynchronous Client Queue Service enqueue continues immediately held dequeue Decoupled in time Service failure = message waits
Three Communication Patterns

Every distributed system interaction fits one of three patterns. Understanding which pattern you are using β€” and which you should be using β€” is the first step to designing reliable inter-service communication. Most teams default to request-response for everything, then retrofit async when latency problems appear. Designing for the right pattern from the start is cheaper.

↔️

Request-Response

  • Pattern: Synchronous, one-to-one
  • Client sends request, waits, receives response
  • Direct feedback on success or failure
  • Use when: result needed to continue
  • Examples: REST API, gRPC, SQL query
πŸ“€

Fire-and-Forget

  • Pattern: Async, one-way, no response
  • Client sends message and does not wait
  • No guarantee of delivery without acknowledgment
  • Use when: result not needed
  • Examples: logging, metrics, UDP
πŸ“‘

Publish-Subscribe

  • Pattern: Async, one-to-many
  • Publisher emits event, N subscribers react
  • Publisher does not know who receives
  • Use when: fan-out, event-driven systems
  • Examples: Kafka, SNS, Redis Pub/Sub
Publish-Subscribe Fan-Out vs Direct Coupling
Direct Coupling Order Service Email Svc Inventory Svc Analytics Svc 3 synchronous calls Knows all 3 consumers Pub/Sub Decoupled Order Service order.placed topic Email Svc Inventory Analytics publishes 1 publish β€” N subscribers Publisher knows nobody
When to Choose Async
⚠️

Synchronous Is Wrong When…

  • Work takes longer than acceptable response time (>200ms)
  • Dependency is unreliable or rate-limited by a third party
  • Result is not needed by the caller to continue
  • Fan-out to multiple downstream services is required
  • You need resilience against dependency failures
🚫

Asynchronous Is Wrong When…

  • Caller needs the result to continue β€” payment confirmation
  • Error must be returned immediately to the user
  • Strict ordering guarantees across services are required
  • System is simple enough that async adds complexity without value
  • Team lacks experience debugging distributed async flows

The rule is simple: if the caller needs the result to continue, use synchronous. If it does not, use asynchronous. Every synchronous call to an unreliable service is a latency and availability risk you have accepted. Make that choice deliberately β€” not by default.

The infrastructure for async communication β€” Kafka, RabbitMQ, SQS β€” including delivery guarantees, consumer groups, and dead-letter queues β€” is covered in depth in Building Blocks: Message Queues & Streaming.

πŸ“‹ Chapter 1 β€” Summary
  • Synchronous: caller blocks, both must be available simultaneously, failure propagates immediately.
  • Asynchronous: caller continues, temporally decoupled, failure isolated β€” message queued not lost.
  • Three patterns: Request-Response (sync, one-to-one), Fire-and-Forget (async, one-way), Publish-Subscribe (async, one-to-many).
  • Use sync when: result needed to continue processing (payment confirmation, auth check).
  • Use async when: result not needed immediately, fan-out required, dependency is unreliable.
  • Every synchronous call accepts the dependency's full latency and failure rate β€” that is not free.
02
Chapter Two

REST β€” Deep Dive

The Protocol That Runs the Web

REST is so ubiquitous that most engineers use it without knowing what it actually is. It is not a protocol. It is not a standard. REST is an architectural style β€” a set of constraints defined by Roy Fielding in his 2000 dissertation. Most APIs called "REST" violate at least two of its six constraints. Understanding what REST actually requires β€” and where most implementations deviate β€” is the difference between an API that ages well and one that requires constant versioning pain.

πŸ–₯️

Client-Server

UI concerns separated from data storage concerns. Frontend and backend evolve independently. The separation is the stability.

πŸ“¦

Stateless

Every request contains all information needed to process it. Server holds no session state between requests. Enables horizontal scaling.

πŸ’Ύ

Cacheable

Responses must declare themselves cacheable or not via Cache-Control headers. Enables CDN and browser caching without client logic.

πŸ”—

Uniform Interface

Resources identified by URIs. Manipulation through representations. Self-descriptive messages. HATEOAS. The most violated constraint in practice.

πŸ§…

Layered System

Client cannot tell if it is talking to origin server or intermediary (CDN, load balancer, cache). Transparency enables infrastructure flexibility.

πŸ“œ

Code on Demand

Optional: server can extend client functionality by delivering executable scripts. JavaScript delivery is the only common use of this constraint.

HTTP Methods β€” Correct Usage

Idempotency is the property that matters most for reliability. If an operation is idempotent, clients can safely retry it after a network failure without producing duplicates. GET, PUT, and DELETE are inherently idempotent. POST is not β€” and that distinction drives how you design retry logic, payment APIs, and everything else that must not create duplicates.

HTTP Methods β€” Safe, Idempotent, and Cacheable Properties
Method Safe Idempotent Cacheable Use For GET βœ“ βœ“ βœ“ Retrieve resource POST βœ— βœ— βœ— Create / action PUT βœ— βœ“ βœ— Replace full resource PATCH βœ— βœ— βœ— Partial update DELETE βœ— βœ“ βœ— Remove resource HEAD βœ“ βœ“ βœ“ Check existence/headers OPTIONS βœ“ βœ“ βœ— CORS preflight POST retried without idempotency key = duplicate actions. Use Idempotency-Key header for safe POST retries.
HTTP Status Codes β€” Use Specific Ones
βœ…

Status Code Families

  • 2xx Success: 200 OK, 201 Created, 202 Accepted (async), 204 No Content
  • 3xx Redirect: 301 Permanent, 302 Temporary, 304 Not Modified (cache valid)
  • 4xx Client Error: 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable, 429 Too Many Requests
  • 5xx Server Error: 500 Internal, 502 Bad Gateway, 503 Unavailable, 504 Timeout
⚠️

Common Status Code Mistakes

  • 200 with error body β€” breaks HTTP monitoring, CDN caching, and client error handling
  • 403 instead of 401 β€” 401 = not authenticated, 403 = authenticated but not authorized
  • 400 for everything β€” use 409 Conflict for duplicate, 422 for validation, 429 for rate limit
  • 200 for async operations β€” use 202 Accepted when result is not yet ready
  • 500 for client errors β€” 5xx signals server bug, 4xx signals client error
Richardson Maturity Model
Richardson Maturity Model β€” Where Most APIs Actually Live
Maturity Level 0 Single endpoint, all operations via POST. Action in body. e.g. POST /api {"action":"getUser","id":1} Level 1 β€” Resources Multiple endpoints per resource, still only POST. e.g. POST /users, POST /orders Level 2 β€” HTTP Verbs Correct HTTP methods on resource endpoints. e.g. GET /users, POST /users, DELETE /users/123 Most APIs stop here Level 3 β€” HATEOAS Responses include hypermedia links to available next actions. Truly discoverable API. Client needs no URL knowledge. Rarely implemented
API Versioning Strategies
πŸ”’

URL Versioning

  • /v1/users, /v2/users
  • Obvious, easy to route and test
  • Easy to document separately per version
  • Purists say: version is not a resource property
  • Used by: most public APIs in practice
πŸ“‹

Header Versioning

  • API-Version: 2024-01-15
  • Clean URLs, stable resource identity
  • Harder to test β€” cannot just paste URL in browser
  • Less visible to consumers not reading docs
  • Used by: Stripe, GitHub
πŸ”€

Content Negotiation

  • Accept: application/vnd.api+json;v=2
  • Most RESTful β€” standard HTTP mechanism
  • Complex to implement and document
  • Consumers rarely understand it
  • Used by: almost nobody
Idempotency Note

GET, PUT, DELETE are idempotent β€” safe to retry on network failure. POST is not. For safe POST retries, require an Idempotency-Key header: client generates a UUID, server stores key + result, subsequent requests with the same key return the stored result without reprocessing. Stripe does this for all payment operations β€” it is why double charges are rare.

Most APIs called REST are actually Level 2 of the Richardson Maturity Model β€” they use HTTP methods correctly on resource endpoints. That is sufficient for production. Do not let REST purity distract from building an API that is consistent, predictable, and does not break clients when it evolves. Additive changes β€” new fields, new endpoints, new optional parameters β€” are non-breaking. Everything else requires a version.

REST at the infrastructure layer β€” API gateways, rate limiting, authentication, and routing β€” is covered in Building Blocks: API Gateway & Proxies.

πŸ“‹ Chapter 2 β€” Summary
  • REST is an architectural style, not a protocol β€” six constraints, most implementations satisfy only a subset.
  • Stateless is the constraint that enables horizontal scaling β€” no session state on servers.
  • HTTP methods: GET/PUT/DELETE are idempotent and safe to retry. POST is not β€” use Idempotency-Key header.
  • Status codes: use specific ones β€” never 200 with an error body, never 403 when 401 is correct.
  • Richardson Level 2 (HTTP verbs + resources) is sufficient for most production APIs.
  • Versioning: URL versioning is most common in practice; header versioning (Stripe, GitHub) is cleanest architecturally.
  • Never break existing clients β€” additive changes only. Breaking changes require a new version with advance notice.
03
Chapter Three

gRPC & Protocol Buffers

When REST Isn't Fast Enough

REST on HTTP/1.1 with JSON is convenient β€” every developer knows it, every tool supports it, every debugger can read it. But convenience has a cost: JSON is verbose, HTTP/1.1 is request-response only, and text parsing is slower than binary. When you are making millions of service-to-service calls per second inside a datacenter, those costs compound. gRPC was built by Google for exactly this problem β€” high-throughput, low-latency, strongly-typed communication between internal services. It is not a replacement for REST. It is the right tool for a specific context.

Protocol Buffers β€” The Foundation
πŸ“

What Protobuf Is

  • Binary serialization format β€” not human-readable
  • Schema defined in .proto files β€” code generated from schema
  • Strongly typed β€” no runtime type surprises
  • Supported languages: Go, Java, Python, C++, Node.js, Ruby, and more
  • Size: typically 3–10Γ— smaller than equivalent JSON
  • Speed: 5–10Γ— faster to parse than JSON
πŸ”„

Schema Evolution Rules

  • Fields identified by field numbers, not names
  • Adding new fields: backward compatible β€” old code ignores unknown fields
  • Removing fields: mark deprecated, never reuse the number
  • Changing field type: breaking change β€” avoid entirely
  • Renaming a field: safe β€” wire format uses numbers, not names
  • Field numbers are the permanent contract β€” choose carefully
HTTP/1.1 vs HTTP/2 β€” Connection Multiplexing
HTTP/1.1 β€” One Request Per Connection Client Server conn 1: req1 resp1 (conn 1 free) conn 2: req2 resp2 conn 3: req3 resp3 3 connections β€” sequential Head-of-line blocking per connection HTTP/2 β€” Multiplexed Streams Client Server 1 connection stream 1: req1 stream 2: req2 stream 3: req3 resp1, resp2, resp3 1 connection β€” concurrent No head-of-line blocking
Four gRPC Service Types
↔️

Unary RPC

  • Client sends one request, server returns one response
  • Same pattern as a REST call β€” simplest to reason about
  • Use: most request-response interactions
  • GetUser(UserRequest) returns User
πŸ“₯

Server Streaming RPC

  • Client sends one request, server returns stream
  • Client reads until stream ends
  • Use: large dataset download, live price feed, log tailing
  • WatchOrders() returns (stream OrderUpdate)
πŸ“€

Client Streaming RPC

  • Client sends stream of requests, server returns one response
  • Server reads all messages then responds once
  • Use: bulk upload, batch sensor data ingestion
  • UploadData(stream Chunk) returns Summary
πŸ”

Bidirectional Streaming

  • Both sides send streams simultaneously
  • Each side reads independently β€” fully concurrent
  • Use: real-time chat, collaborative editing, game state sync
  • Chat() returns (stream Message) β€” both directions
REST (HTTP/1.1 + JSON)
gRPC (HTTP/2 + Protobuf)
  • Human-readable JSON β€” easy to debug
  • Any client without special tooling or codegen
  • HTTP/1.1 β€” one request per connection
  • Larger payload β€” verbose text encoding
  • No built-in streaming (SSE is one-way only)
  • Native browser support β€” paste URL in browser
  • Best for: public APIs, external clients, browsers
  • Binary Protobuf β€” not human-readable, harder to debug
  • Requires generated client stubs from .proto schema
  • HTTP/2 β€” multiplexed, compressed, low overhead
  • Compact payload β€” 3–10Γ— smaller than JSON
  • 4 streaming modes built into the protocol
  • No native browser support (grpc-web proxy required)
  • Best for: internal service-to-service, high QPS, streaming

gRPC is not a replacement for REST β€” it is the right tool for internal service-to-service communication. The moment you need external clients or browser support, REST is still the correct choice. gRPC's lack of native browser support is not a limitation that will be fixed β€” it is a fundamental consequence of the HTTP/2 framing browser APIs do not expose. Use gRPC where you control both client and server.

gRPC in a service mesh β€” where Istio or Linkerd handle mTLS, retries, and load balancing for gRPC traffic β€” is covered in Architecture Styles: Service Mesh. Note: standard L4 load balancers do not distribute HTTP/2 streams correctly β€” you need an L7 gRPC-aware load balancer.

πŸ“‹ Chapter 3 β€” Summary
  • Protobuf: binary, strongly typed, 3–10Γ— smaller than JSON, 5–10Γ— faster to parse.
  • Schema evolution: add fields freely, never reuse field numbers, rename safely, changing type is breaking.
  • HTTP/2 multiplexing: multiple streams on one connection β€” no head-of-line blocking between requests.
  • Four streaming types: Unary (1:1), Server Streaming (1:N), Client Streaming (N:1), Bidirectional (N:N).
  • Use gRPC for: internal services, high QPS, streaming, strongly-typed contracts across teams.
  • Use REST for: public APIs, browser clients, external consumers, when debuggability matters.
  • gRPC limitation: no native browser support, requires L7 load balancer aware of HTTP/2 streams.
04
Chapter Four

GraphQL

Client-Specified Queries and the N+1 Problem

REST was designed with servers deciding what data to return. GraphQL flips that relationship β€” clients specify exactly what they need and the server returns exactly that, no more and no less. This solves real problems: mobile clients that cannot afford to fetch large payloads they only partially use, frontend teams blocked waiting for backend teams to add fields to an endpoint, multiple round-trips to assemble a single view from separate resources. GraphQL is not always the answer. Its flexibility comes with costs that most teams underestimate until they are already in production.

REST β€” Over and Under-fetching
GraphQL β€” Precise Fetching
  • GET /users/123 returns all 47 fields
  • Client needs 3 fields β€” 44 are wasted bandwidth
  • Related posts require a separate request
  • Post comments require yet another request
  • 3 round-trips to assemble one view
  • Backend team must add new endpoint for new data need
  • Client queries exactly: name, avatar, posts{title}
  • Response contains only those fields β€” nothing extra
  • Relationships traversed in one request
  • One round-trip for the same view
  • Mobile client saves bandwidth and battery
  • Frontend can iterate without waiting on backend
N+1 Query Problem vs DataLoader Batching
N+1 Problem β€” Naive Fetch 10 posts (query 1) Database SELECT * FROM posts SELECT user WHERE id=1 SELECT user WHERE id=2 SELECT user WHERE id=3 ... 7 more ... 11 queries for 10 posts 100 posts = 101 queries 1000 posts = 1001 queries Database overwhelmed at scale DataLoader β€” Batched Fetch 10 posts (query 1) DataLoader collects IDs: [1,2,3,...10] Database SELECT * FROM users WHERE id IN (1..10) 2 queries regardless of N 100 posts = still 2 queries DataLoader: batch + cache
GraphQL Operations & Production Concerns
πŸ”

Query β€” Read

Client specifies exact fields needed. Traverses relationships in one request. Returns precisely what was asked for β€” nothing more.

✏️

Mutation β€” Write

Create, update, delete operations. Returns affected data, enabling optimistic UI updates. Equivalent to POST/PUT/DELETE in REST.

πŸ“‘

Subscription β€” Real-time

Client subscribes to data changes. Server pushes updates via WebSocket when data changes. Use for live notifications and collaborative features.

πŸ’£

Query Complexity Attack

Deeply nested queries multiply database load exponentially. A single query can bring down an unprotected server. Always implement depth limits and complexity scoring before going public.

When NOT to Use GraphQL
  • Simple CRUD β€” REST is cleaner and easier to reason about
  • Internal services β€” gRPC is faster and more efficient
  • Small team β€” versioning pain not yet worth the learning curve
  • Performance-critical paths β€” query cost is hard to bound without persisted queries
  • No GraphQL expertise β€” N+1, caching, and complexity are non-obvious failure modes

GraphQL's flexibility is also its attack surface. Without query depth limits and complexity scoring, a single malicious or accidental query can exponentially multiply your database load. This is not theoretical β€” it has taken down production GraphQL APIs. Always implement query complexity analysis before going public. Also: GraphQL invalidates standard HTTP caching since all queries are POST to one endpoint β€” you need Apollo Client-side caching or persisted queries with CDN.

πŸ“‹ Chapter 4 β€” Summary
  • REST problems GraphQL solves: over-fetching (too much data), under-fetching (multiple round-trips).
  • Client specifies exact fields needed β€” one request traverses the entire data graph.
  • Core operations: Query (read), Mutation (write), Subscription (real-time via WebSocket).
  • N+1 problem: 10 posts naively = 11 queries. DataLoader batches to 2 queries regardless of N.
  • Production concerns: query complexity attacks, HTTP caching broken, monitoring harder (one endpoint).
  • Use GraphQL when: complex data graph, multiple client types with different data needs, rapid frontend iteration.
  • Do not use when: simple CRUD, internal services, no GraphQL expertise, performance-critical paths.
05
Chapter Five

WebSockets & Server-Sent Events

Persistent Connections for Real-Time Data

HTTP was designed for documents β€” request something, receive it, connection closes. That model works for 99% of web interactions. But live chat, collaborative editing, real-time dashboards, and multiplayer games do not fit the request-response model. You need data to flow from server to client unprompted, or between clients in real time. Long polling was the first hack. Server-Sent Events was the clean HTTP solution for the server-to-client case. WebSockets were the right answer when both sides need to talk simultaneously.

Evolution of Real-Time on the Web β€” Long Polling vs SSE vs WebSocket
Long Polling Server-Sent Events WebSocket Client Server request 1 hold event fires β†’ resp request 2 New connection per event Wasteful β€” fallback only Client Server open connection event 1 event 2 event 3 One connection Server to Client only Auto-reconnect built in Client Server HTTP upgrade 101 Switching msg from client msg from server Full duplex Both sides send No auto-reconnect
⏳

Long Polling

  • Direction: Client pulls (simulated push)
  • Connection: New HTTP connection per event
  • Server holds connection open until event fires
  • Wasteful β€” ties up threads and connections
  • Use only as: fallback when WebSocket blocked
  • Examples: legacy notification systems
πŸ“Ί

Server-Sent Events

  • Direction: Server to Client only
  • Connection: One persistent HTTP connection
  • Native browser support via EventSource API
  • Automatic reconnection built in
  • Works over HTTP/2 β€” multiplexed efficiently
  • Use for: live feeds, notifications, dashboards, progress
  • Examples: GitHub Actions logs, stock tickers
πŸ’¬

WebSockets

  • Direction: Full duplex β€” both sides send
  • Connection: Upgraded persistent ws:// connection
  • Lower overhead after initial handshake
  • No automatic reconnection β€” must implement manually
  • Stateful β€” pins client to one server (scaling challenge)
  • Use for: chat, gaming, collaborative editing
  • Examples: Slack, Figma, online games
WebSocket Horizontal Scaling
WebSocket Scaling via Redis Pub/Sub
Load Balancer WS Server 1 User A connected WS Server 2 User B connected WS Server 3 User C connected Redis Pub/Sub shared message channel Scenario: User A (Server 1) sends message to User B (Server 2) 1. publish 2. subscriber receives Redis bridges servers holding different users This is how Slack, Discord, and Socket.io scale to millions of connections

WebSocket servers are stateful β€” a connected client is pinned to a server. This breaks horizontal scaling. The standard solution is a shared pub/sub layer (Redis pub/sub) so all WebSocket servers can receive messages for any connected client. When a user on Server 1 sends a message to a user on Server 2, Server 1 publishes to Redis, Server 2 receives from its subscription, and delivers to its connected user. This is how Slack, Discord, and Socket.io handle millions of concurrent connections across hundreds of servers.

A complete guided design of a real-time chat system β€” WebSockets at scale with message persistence, delivery guarantees, and presence tracking β€” is covered as a full case study: Case Study: Chat System.

πŸ“‹ Chapter 5 β€” Summary
  • Long polling: simulated push with repeated HTTP requests β€” wasteful, use only as fallback.
  • SSE: server pushes over HTTP, one-way, automatic reconnect, native browser EventSource API.
  • WebSocket: full-duplex persistent connection, both sides send, no automatic reconnection.
  • Use SSE for: live feeds, notifications, progress updates, dashboards β€” server-to-client only.
  • Use WebSocket for: chat, gaming, collaborative editing β€” anything requiring client-to-server real-time push.
  • WebSocket scaling challenge: stateful connections pin clients to servers β€” solve with Redis pub/sub fanout.
06
Chapter Six

Event-Driven Architecture

Systems That React Instead of Ask

In a synchronous system, service A asks service B for something and waits. Service A knows about service B. When service B slows down, service A slows down. Event-driven architecture inverts this. Service A emits an event β€” "an order was placed" β€” and does not know or care who reacts. Service B, C, and D each react independently. They can fail, be slow, or not exist yet β€” service A is unaffected. This decoupling is genuinely powerful. It is also a source of operational complexity that teams consistently underestimate until they are debugging a broken saga at 2am.

Three Types of Events
πŸ“’

Event Notification

  • Minimal payload β€” just the signal
  • Example: {"type":"order.placed","id":"123"}
  • Consumer fetches full data separately if needed
  • Pros: small, decoupled, privacy-safe
  • Cons: extra round-trip to fetch data
  • Use when: consumers vary in what data they need
πŸ“¦

Event-Carried State Transfer

  • Full data payload included in the event
  • Consumer is self-contained β€” no extra requests
  • Pros: fast, no round-trip, consumer autonomous
  • Cons: large payload, sensitive data in stream
  • Use when: most consumers need full data
πŸ“œ

Event Sourcing

  • Event log IS the source of truth
  • Current state derived by replaying events
  • Full audit trail, time-travel queries, replay for new services
  • Cons: complex, schema evolution is hard
  • Use only when: history itself is valuable business data
  • Financial ledgers, audit systems, legal records
Choreography vs Orchestration
Choreography β€” Decentralized Order Svc Payment Svc Inventory Svc Shipping Svc order.placed order.placed payment.done Loose coupling, high autonomy Hard to trace end-to-end flow No central visibility into state Orchestration β€” Centralized Orchestrator Temporal / Step Fn Payment Inventory Shipping Notification Full visibility into process state Orchestrator is coupling point
Choreography
Orchestration
  • Services react autonomously to domain events
  • No central coordinator β€” nothing to become SPOF
  • Easy to add new behavior: subscribe new service
  • Hard to see the overall process state at a glance
  • Debugging requires distributed tracing across services
  • Use for: autonomous teams, microservices, loose coupling
  • Central process manager (Temporal, Step Functions) coordinates steps
  • Orchestrator tracks overall workflow state explicitly
  • Easy to handle exceptions, retries, compensation centrally
  • Orchestrator becomes a coupling point
  • Harder to evolve steps independently
  • Use for: complex workflows, regulated processes, SLAs
CQRS β€” Command Query Responsibility Segregation

Most systems use the same data model for reading and writing. This works until read and write patterns diverge enough that optimizing one hurts the other. CQRS separates them: a write model optimized for consistency and validation, and a read model optimized for query performance. The cost is eventual consistency β€” the read model is updated asynchronously from the write model.

Write Model (Commands)
Read Model (Queries)
  • Optimized for consistency and validation
  • Normalized β€” referential integrity enforced
  • Source of truth β€” authoritative state
  • Example: relational database with full ACID guarantees
  • Optimized for query performance β€” can be denormalized
  • Tailored to specific UI query patterns
  • Can be a separate store (Elasticsearch, Redis, read replica)
  • Updated asynchronously β€” eventually consistent with write model
When to Use CQRS

Use when read and write patterns are fundamentally different β€” heavy reads with complex joins alongside high-throughput writes with complex validation. Do NOT use for simple CRUD β€” the operational overhead and eventual consistency complexity is not worth it. CQRS is often combined with Event Sourcing but they are independent patterns; you can use either without the other.

Saga Pattern β€” Distributed Transactions

Distributed transactions across multiple services cannot use traditional database 2PC (two-phase commit) β€” it is impractical at service boundaries. The Saga pattern breaks a distributed transaction into a sequence of local transactions, each publishing an event to trigger the next step. If any step fails, compensating transactions run in reverse to undo completed steps.

🎡

Choreography Saga

  • Each service reacts to events and publishes the next
  • No central coordinator β€” services are autonomous
  • Loose coupling β€” easy to add new steps
  • Hard to see overall transaction state at a glance
  • Example: Order β†’ Payment β†’ Inventory β†’ Shipping, each reacting to previous events
🎬

Orchestration Saga

  • Central saga orchestrator directs each step explicitly
  • Orchestrator tracks overall transaction state
  • Centralized compensation logic on failure
  • Orchestrator is a coupling point β€” SPOF risk
  • Tools: Temporal, AWS Step Functions, Conductor
The Outbox Pattern β€” Reliable Event Publishing
Dual-Write Problem vs Outbox Pattern Solution
Dual-Write Problem Service Database Queue 1. write OK 2. FAILS βœ— DB has data Queue has nothing Downstream never notified. Silent data inconsistency. Outbox Pattern Solution One DB Transaction write main table + outbox table both commit or neither commits Database Outbox Table Outbox Poller Message Queue reads publishes At-least-once delivery guaranteed Poller retries on failure. Consumers must be idempotent.

Event sourcing is overused. It is a genuinely powerful pattern for systems where the history of state changes is itself valuable business data β€” financial ledgers, audit systems, legal records, compliance trails. For most systems it adds substantial complexity without adding value. If you cannot articulate why the event history is valuable beyond "we might want it later," use a regular database with an audit log column instead. The outbox pattern, by contrast, is underused β€” every service that publishes events should be using it.

Event Schema Evolution

Events are a public contract β€” consumers depend on their structure. Schema changes break consumers silently and are difficult to coordinate across teams. Use a schema registry (Confluent Schema Registry for Kafka) to enforce compatibility rules. Backward compatible changes: new fields must be optional with defaults. Forward compatible: new consumers must handle old events without new fields. Never remove fields β€” mark them deprecated. Test schema compatibility in CI before any event schema change reaches production.

The infrastructure that makes event-driven architecture work β€” Kafka partitioning, consumer groups, dead-letter queues, delivery guarantees β€” is covered in depth in Building Blocks: Message Queues & Streaming.

πŸ“‹ Chapter 6 β€” Summary
  • Three event types: Notification (signal only), State Transfer (full payload), Event Sourcing (log is truth).
  • Choreography: autonomous services react to events β€” loose coupling, hard to trace overall flow.
  • Orchestration: central coordinator (Temporal, Step Functions) β€” visible state, coupling at coordinator.
  • CQRS: separate read and write models for independent scaling β€” write for consistency, read for performance.
  • Outbox pattern: write to DB + outbox table in one transaction; poller publishes to queue. Atomic, at-least-once delivery.
  • Saga pattern: distributed transactions via local steps with compensating actions on failure.
  • Event sourcing: only when event history itself is valuable business data β€” not as a default architecture.
07
Chapter Seven

API Design Best Practices

APIs as Long-Lived Public Contracts

An API is a promise. Unlike internal code which you can refactor whenever you want, an API has consumers you may not control. Change it carelessly and you break systems you did not write, maintained by teams you may not even know exist. The practices in this chapter exist because APIs live longer than the engineers who designed them. The decisions you make at version one β€” around idempotency, pagination, error format, deprecation β€” will either protect you or haunt you for years.

Idempotency β€” Safe Retries
πŸ”‘

Idempotency Key Pattern

  • Client generates a unique UUID per intended operation
  • Sends it as header: Idempotency-Key: uuid
  • Server checks if key seen before:
    • First time: process request, store result with key
    • Duplicate: return stored result β€” do not reprocess
  • Key expiry: typically 24 hours
  • Stripe uses this for all payment operations
πŸ”„

Why It Matters

  • Network failures cause clients to retry
  • Without idempotency: retry = duplicate action
  • Payment retried = double charge
  • Email retried = duplicate email sent
  • GET, PUT, DELETE: inherently idempotent β€” safe to retry
  • POST: not idempotent by default β€” add Idempotency-Key
Pagination β€” Offset vs Cursor
Offset Pagination vs Cursor Pagination
Offset Pagination β€” Problems GET /posts?offset=40&limit=20 Rows 1–40: scanned and discarded Rows 41–60: returned to client Rows 61–... not reached Database scans rows 1–60, discards 1–40 Expensive at high offsets: OFFSET 10000 = 10000 discarded Unstable: new insertion shifts subsequent page results User sees items twice or skips items during pagination Cursor Pagination β€” Stable GET /posts?cursor=eyJpZCI6NDB9&limit=20 Rows before cursor: skipped directly (index seek) Next 20 rows from cursor position Rows after: not touched Direct seek β€” no rows discarded regardless of depth Stable: insertions/deletions do not shift results Limitation: no jump to arbitrary page number Cursor encodes last-seen ID in base64 or opaque token
Consistent Error Format

Every API error must return the same structure. Inconsistent errors force clients to write defensive parsing logic for every endpoint. One bad error format early in an API's life creates years of backward compatibility debt. Define the contract once and enforce it across every endpoint from day one.

πŸ“‹

Standard Error Structure

  • HTTP status code: correct 4xx or 5xx β€” never 200
  • code: machine-readable constant (PAYMENT_DECLINED)
  • message: human-readable description for display
  • details: field-level issues for validation errors
  • request_id: unique ID for debugging and support tickets
  • documentation_url: link to error explanation for complex errors
⚠️

Common Error Format Mistakes

  • 200 OK with "success": false in body β€” breaks HTTP caching and monitoring
  • 403 Forbidden when 401 Unauthorized is correct (not authenticated vs not authorized)
  • 400 Bad Request for everything instead of specific 409, 422, 429
  • 200 for accepted async operations β€” use 202 Accepted
  • Different error shapes per endpoint β€” forces client defensive parsing
  • Exposing stack traces or internal paths in error messages
Standard Error Response
{
  "error": {
    "code": "PAYMENT_DECLINED",
    "message": "The card was declined by the issuer",
    "details": [{ "field": "card_number", "issue": "Invalid" }],
    "request_id": "req_abc123",
    "documentation_url": "https://api.example.com/docs/errors/PAYMENT_DECLINED"
  }
}
Rate Limiting & Authentication
πŸͺ£

Token Bucket

  • Bucket holds N tokens, refills at fixed rate
  • Each request consumes one token
  • Burst allowed β€” use tokens accumulated at idle
  • Best for: most public APIs β€” burst-friendly
  • Example: 100 tokens, refill 10/sec
πŸͺŸ

Sliding Window

  • Rolling time window tracks request count
  • Smoother than fixed window β€” no boundary spikes
  • More memory: timestamp stored per request
  • Best for: smooth rate distribution
  • No 2Γ— spike possible at window boundary
πŸ•

Fixed Window

  • Count requests per period (minute or hour)
  • Simplest to implement and explain
  • Risk: 2Γ— rate possible at window boundary
  • Best for: loose rate limiting, internal APIs
  • Return: X-RateLimit-Remaining, Retry-After
Standard Rate Limit Response Headers

Include these on every response so clients can implement respectful retry logic without guessing:

X-RateLimit-Limit: 1000        # requests allowed per window
X-RateLimit-Remaining: 743     # requests remaining in current window
X-RateLimit-Reset: 1703721600  # Unix timestamp when window resets
Retry-After: 30                # seconds to wait (on 429 response only)
πŸ—οΈ

API Keys

  • Per-client secret token issued at registration
  • Simple to issue, revoke, and rotate
  • No expiry by default β€” rotate regularly
  • Use for: server-to-server, non-user-specific
🎫

JWT

  • Self-contained signed token with claims
  • Stateless β€” no server lookup required
  • Short-lived: 15 min to 1 hour typically
  • Use for: user sessions, microservice auth
πŸ”

OAuth 2.0

  • Delegated authorization standard
  • Third-party access with explicit scopes
  • More complex but more powerful
  • Use for: external integrations, social login
Deprecation Standard

Never remove an endpoint without warning. Add Sunset header (RFC 8594) to deprecated endpoint responses: Sunset: Sat, 31 Dec 2025 23:59:59 GMT. Provide minimum 6 months notice. Monitor who is still calling the deprecated endpoint. Contact remaining consumers before removal. Remove only after usage reaches zero or the deadline has passed. Additive changes β€” new fields, new optional parameters β€” require no versioning at all.

An API is a promise. The moment you have an external consumer, every breaking change is a support burden, an outage risk, and a trust violation. Design your API as if it will live for 10 years β€” because the good ones do. Pagination strategy, error format, and idempotency keys are not implementation details you can retrofit. They are the shape of the contract you are signing with every consumer on day one.

Authentication and authorization in depth β€” JWT internals, OAuth 2.0 flows, token storage security, and zero-trust models β€” is covered in Security & Observability: Authentication & Authorization.

πŸ“‹ Chapter 7 β€” Summary
  • Idempotency key: client-generated UUID, server stores result, return stored on duplicate β€” critical for safe POST retries.
  • Offset pagination: simple but expensive at depth and unstable when data changes during pagination.
  • Cursor pagination: stable, efficient at any depth, no arbitrary page jumps β€” use for feeds and large datasets.
  • Rate limiting: Token Bucket for burst-friendly, Sliding Window for smooth, Fixed Window for simple.
  • Authentication: API Keys (server-to-server), JWT (user sessions, stateless), OAuth 2.0 (delegated, scoped).
  • Deprecation: Sunset header (RFC 8594), 6-month minimum notice, monitor usage, never remove until consumers migrate.
  • Error format: always consistent β€” HTTP status code, machine-readable code, human message, request ID, documentation URL.
Communication & APIs at a Glance
01 Β· Sync vs Async

Coupling Is the Cost of Convenience

  • Sync: caller blocks, both must be available simultaneously
  • Async: caller continues, temporally decoupled, failure isolated
  • Three patterns: Request-Response, Fire-and-Forget, Pub-Sub
  • Every sync call accepts the dependency's full failure rate
02 Β· REST

Architectural Style, Not a Protocol

  • Six constraints β€” Stateless is the one that enables scaling
  • GET/PUT/DELETE idempotent β€” POST is not, add Idempotency-Key
  • Most APIs are Richardson Level 2 β€” sufficient for production
  • Versioning: URL most common, header most architecturally clean
03 Β· gRPC

Internal Services, Not Public APIs

  • Protobuf: binary, 3–10Γ— smaller than JSON, strongly typed
  • HTTP/2 multiplexing: multiple streams on one connection
  • Four modes: Unary, Server streaming, Client streaming, Bidirectional
  • No native browser support β€” requires L7 gRPC-aware load balancer
04 Β· GraphQL

Client-Specified Queries β€” With Real Costs

  • Solves over-fetching and under-fetching from fixed REST endpoints
  • N+1 problem: DataLoader batches 11 queries into 2 regardless of N
  • Query complexity attacks: always implement depth limits before launch
  • HTTP caching broken β€” all queries POST to one endpoint
05 Β· WebSockets & SSE

Real-Time Without Polling

  • SSE: server-to-client only, auto-reconnect, native browser support
  • WebSocket: full duplex, no auto-reconnect, stateful connections
  • WebSocket scaling: Redis pub/sub bridges servers for different users
  • Long polling: fallback only β€” new connection per event is wasteful
06 Β· Event-Driven

React Instead of Ask

  • Three event types: Notification, State Transfer, Event Sourcing
  • Outbox pattern: atomic DB write + event publish via transaction
  • Choreography: loose coupling, hard to trace. Orchestration: visible, coupled
  • Event sourcing: only when history is valuable business data
07 Β· API Best Practices

APIs Live Longer Than Their Designers

  • Idempotency-Key: store result, return stored on duplicate POST
  • Cursor over offset: stable, efficient at any depth
  • Token Bucket for burst traffic, Sliding Window for smoothness
  • Sunset header (RFC 8594): 6 months minimum before removal