System Design · Communication & APIs

Communication & APIs

How components talk to each other and to the outside world.

Chapter One

Synchronous vs Asynchronous Communication

The Fundamental Communication Choice

A service that calls another service synchronously is only as reliable as that dependency. If the payment service times out, the checkout service times out. If the email service is slow, the registration endpoint is slow. Synchronous coupling is invisible until a dependency fails — and then it is catastrophically visible. Asynchronous communication breaks that coupling at the cost of complexity. The question every system designer must answer is deceptively simple: does the caller need the result right now, or just eventually?

Synchronous

Asynchronous

Caller blocks waiting for response
Both parties must be available simultaneously
Failure in dependency propagates to caller immediately
Total latency = sum of all dependency latencies
Simple to reason about — linear call trace
Thread/connection held for full duration of call
Use when: result needed to continue processing

Caller fires message and continues immediately
Temporal decoupling — parties need not be available together
Failure in dependency: message queued, not lost
Caller latency decoupled from dependency latency
Complex to debug — requires distributed traces
Requires idempotent message handlers
Use when: result not needed immediately by caller

Communication Timeline — Synchronous vs Asynchronous

Three Communication Patterns

Every distributed system interaction fits one of three patterns. Understanding which pattern you are using — and which you should be using — is the first step to designing reliable inter-service communication. Most teams default to request-response for everything, then retrofit async when latency problems appear. Designing for the right pattern from the start is cheaper.

↔️

Request-Response

Pattern: Synchronous, one-to-one
Client sends request, waits, receives response
Direct feedback on success or failure
Use when: result needed to continue
Examples: REST API, gRPC, SQL query

📤

Fire-and-Forget

Pattern: Async, one-way, no response
Client sends message and does not wait
No guarantee of delivery without acknowledgment
Use when: result not needed
Examples: logging, metrics, UDP

📡

Publish-Subscribe

Pattern: Async, one-to-many
Publisher emits event, N subscribers react
Publisher does not know who receives
Use when: fan-out, event-driven systems
Examples: Kafka, SNS, Redis Pub/Sub

Publish-Subscribe Fan-Out vs Direct Coupling

When to Choose Async

⚠️

Synchronous Is Wrong When…

Work takes longer than acceptable response time (>200ms)
Dependency is unreliable or rate-limited by a third party
Result is not needed by the caller to continue
Fan-out to multiple downstream services is required
You need resilience against dependency failures

🚫

Asynchronous Is Wrong When…

Caller needs the result to continue — payment confirmation
Error must be returned immediately to the user
Strict ordering guarantees across services are required
System is simple enough that async adds complexity without value
Team lacks experience debugging distributed async flows

The rule is simple: if the caller needs the result to continue, use synchronous. If it does not, use asynchronous. Every synchronous call to an unreliable service is a latency and availability risk you have accepted. Make that choice deliberately — not by default.

The infrastructure for async communication — Kafka, RabbitMQ, SQS — including delivery guarantees, consumer groups, and dead-letter queues — is covered in depth in Building Blocks: Message Queues & Streaming.

📋 Chapter 1 — Summary

Synchronous: caller blocks, both must be available simultaneously, failure propagates immediately.
Asynchronous: caller continues, temporally decoupled, failure isolated — message queued not lost.
Three patterns: Request-Response (sync, one-to-one), Fire-and-Forget (async, one-way), Publish-Subscribe (async, one-to-many).
Use sync when: result needed to continue processing (payment confirmation, auth check).
Use async when: result not needed immediately, fan-out required, dependency is unreliable.
Every synchronous call accepts the dependency's full latency and failure rate — that is not free.

Chapter Two

REST — Deep Dive

The Protocol That Runs the Web

REST is so ubiquitous that most engineers use it without knowing what it actually is. It is not a protocol. It is not a standard. REST is an architectural style — a set of constraints defined by Roy Fielding in his 2000 dissertation. Most APIs called "REST" violate at least two of its six constraints. Understanding what REST actually requires — and where most implementations deviate — is the difference between an API that ages well and one that requires constant versioning pain.

🖥️

Client-Server

UI concerns separated from data storage concerns. Frontend and backend evolve independently. The separation is the stability.

📦

Stateless

Every request contains all information needed to process it. Server holds no session state between requests. Enables horizontal scaling.

💾

Cacheable

Responses must declare themselves cacheable or not via Cache-Control headers. Enables CDN and browser caching without client logic.

🔗

Uniform Interface

Resources identified by URIs. Manipulation through representations. Self-descriptive messages. HATEOAS. The most violated constraint in practice.

🧅

Layered System

Client cannot tell if it is talking to origin server or intermediary (CDN, load balancer, cache). Transparency enables infrastructure flexibility.

📜

Code on Demand

Optional: server can extend client functionality by delivering executable scripts. JavaScript delivery is the only common use of this constraint.

HTTP Methods — Correct Usage

Idempotency is the property that matters most for reliability. If an operation is idempotent, clients can safely retry it after a network failure without producing duplicates. GET, PUT, and DELETE are inherently idempotent. POST is not — and that distinction drives how you design retry logic, payment APIs, and everything else that must not create duplicates.

HTTP Methods — Safe, Idempotent, and Cacheable Properties

HTTP Status Codes — Use Specific Ones

✅

Status Code Families

2xx Success: 200 OK, 201 Created, 202 Accepted (async), 204 No Content
3xx Redirect: 301 Permanent, 302 Temporary, 304 Not Modified (cache valid)
4xx Client Error: 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable, 429 Too Many Requests
5xx Server Error: 500 Internal, 502 Bad Gateway, 503 Unavailable, 504 Timeout

⚠️

Common Status Code Mistakes

200 with error body — breaks HTTP monitoring, CDN caching, and client error handling
403 instead of 401 — 401 = not authenticated, 403 = authenticated but not authorized
400 for everything — use 409 Conflict for duplicate, 422 for validation, 429 for rate limit
200 for async operations — use 202 Accepted when result is not yet ready
500 for client errors — 5xx signals server bug, 4xx signals client error

Richardson Maturity Model

Richardson Maturity Model — Where Most APIs Actually Live

API Versioning Strategies

🔢

URL Versioning

/v1/users, /v2/users
Obvious, easy to route and test
Easy to document separately per version
Purists say: version is not a resource property
Used by: most public APIs in practice

📋

Header Versioning

API-Version: 2024-01-15
Clean URLs, stable resource identity
Harder to test — cannot just paste URL in browser
Less visible to consumers not reading docs
Used by: Stripe, GitHub

🔀

Content Negotiation

Accept: application/vnd.api+json;v=2
Most RESTful — standard HTTP mechanism
Complex to implement and document
Consumers rarely understand it
Used by: almost nobody

Idempotency Note

GET, PUT, DELETE are idempotent — safe to retry on network failure. POST is not. For safe POST retries, require an Idempotency-Key header: client generates a UUID, server stores key + result, subsequent requests with the same key return the stored result without reprocessing. Stripe does this for all payment operations — it is why double charges are rare.

Most APIs called REST are actually Level 2 of the Richardson Maturity Model — they use HTTP methods correctly on resource endpoints. That is sufficient for production. Do not let REST purity distract from building an API that is consistent, predictable, and does not break clients when it evolves. Additive changes — new fields, new endpoints, new optional parameters — are non-breaking. Everything else requires a version.

REST at the infrastructure layer — API gateways, rate limiting, authentication, and routing — is covered in Building Blocks: API Gateway & Proxies.

📋 Chapter 2 — Summary

REST is an architectural style, not a protocol — six constraints, most implementations satisfy only a subset.
Stateless is the constraint that enables horizontal scaling — no session state on servers.
HTTP methods: GET/PUT/DELETE are idempotent and safe to retry. POST is not — use Idempotency-Key header.
Status codes: use specific ones — never 200 with an error body, never 403 when 401 is correct.
Richardson Level 2 (HTTP verbs + resources) is sufficient for most production APIs.
Versioning: URL versioning is most common in practice; header versioning (Stripe, GitHub) is cleanest architecturally.
Never break existing clients — additive changes only. Breaking changes require a new version with advance notice.

Chapter Three

gRPC & Protocol Buffers

When REST Isn't Fast Enough

REST on HTTP/1.1 with JSON is convenient — every developer knows it, every tool supports it, every debugger can read it. But convenience has a cost: JSON is verbose, HTTP/1.1 is request-response only, and text parsing is slower than binary. When you are making millions of service-to-service calls per second inside a datacenter, those costs compound. gRPC was built by Google for exactly this problem — high-throughput, low-latency, strongly-typed communication between internal services. It is not a replacement for REST. It is the right tool for a specific context.

Protocol Buffers — The Foundation

📐

What Protobuf Is

Binary serialization format — not human-readable
Schema defined in .proto files — code generated from schema
Strongly typed — no runtime type surprises
Supported languages: Go, Java, Python, C++, Node.js, Ruby, and more
Size: typically 3–10× smaller than equivalent JSON
Speed: 5–10× faster to parse than JSON

🔄

Schema Evolution Rules

Fields identified by field numbers, not names
Adding new fields: backward compatible — old code ignores unknown fields
Removing fields: mark deprecated, never reuse the number
Changing field type: breaking change — avoid entirely
Renaming a field: safe — wire format uses numbers, not names
Field numbers are the permanent contract — choose carefully

HTTP/1.1 vs HTTP/2 — Connection Multiplexing

Four gRPC Service Types

↔️

Unary RPC

Client sends one request, server returns one response
Same pattern as a REST call — simplest to reason about
Use: most request-response interactions
GetUser(UserRequest) returns User

📥

Server Streaming RPC

Client sends one request, server returns stream
Client reads until stream ends
Use: large dataset download, live price feed, log tailing
WatchOrders() returns (stream OrderUpdate)

📤

Client Streaming RPC

Client sends stream of requests, server returns one response
Server reads all messages then responds once
Use: bulk upload, batch sensor data ingestion
UploadData(stream Chunk) returns Summary

🔁

Bidirectional Streaming

Both sides send streams simultaneously
Each side reads independently — fully concurrent
Use: real-time chat, collaborative editing, game state sync
Chat() returns (stream Message) — both directions

REST (HTTP/1.1 + JSON)

gRPC (HTTP/2 + Protobuf)

Human-readable JSON — easy to debug
Any client without special tooling or codegen
HTTP/1.1 — one request per connection
Larger payload — verbose text encoding
No built-in streaming (SSE is one-way only)
Native browser support — paste URL in browser
Best for: public APIs, external clients, browsers

Binary Protobuf — not human-readable, harder to debug
Requires generated client stubs from .proto schema
HTTP/2 — multiplexed, compressed, low overhead
Compact payload — 3–10× smaller than JSON
4 streaming modes built into the protocol
No native browser support (grpc-web proxy required)
Best for: internal service-to-service, high QPS, streaming

gRPC is not a replacement for REST — it is the right tool for internal service-to-service communication. The moment you need external clients or browser support, REST is still the correct choice. gRPC's lack of native browser support is not a limitation that will be fixed — it is a fundamental consequence of the HTTP/2 framing browser APIs do not expose. Use gRPC where you control both client and server.

gRPC in a service mesh — where Istio or Linkerd handle mTLS, retries, and load balancing for gRPC traffic — is covered in Architecture Styles: Service Mesh. Note: standard L4 load balancers do not distribute HTTP/2 streams correctly — you need an L7 gRPC-aware load balancer.

📋 Chapter 3 — Summary

Protobuf: binary, strongly typed, 3–10× smaller than JSON, 5–10× faster to parse.
Schema evolution: add fields freely, never reuse field numbers, rename safely, changing type is breaking.
HTTP/2 multiplexing: multiple streams on one connection — no head-of-line blocking between requests.
Four streaming types: Unary (1:1), Server Streaming (1:N), Client Streaming (N:1), Bidirectional (N:N).
Use gRPC for: internal services, high QPS, streaming, strongly-typed contracts across teams.
Use REST for: public APIs, browser clients, external consumers, when debuggability matters.
gRPC limitation: no native browser support, requires L7 load balancer aware of HTTP/2 streams.

Chapter Four

GraphQL

Client-Specified Queries and the N+1 Problem

REST was designed with servers deciding what data to return. GraphQL flips that relationship — clients specify exactly what they need and the server returns exactly that, no more and no less. This solves real problems: mobile clients that cannot afford to fetch large payloads they only partially use, frontend teams blocked waiting for backend teams to add fields to an endpoint, multiple round-trips to assemble a single view from separate resources. GraphQL is not always the answer. Its flexibility comes with costs that most teams underestimate until they are already in production.

REST — Over and Under-fetching

GraphQL — Precise Fetching

GET /users/123 returns all 47 fields
Client needs 3 fields — 44 are wasted bandwidth
Related posts require a separate request
Post comments require yet another request
3 round-trips to assemble one view
Backend team must add new endpoint for new data need

Client queries exactly: name, avatar, posts{title}
Response contains only those fields — nothing extra
Relationships traversed in one request
One round-trip for the same view
Mobile client saves bandwidth and battery
Frontend can iterate without waiting on backend

N+1 Query Problem vs DataLoader Batching

GraphQL Operations & Production Concerns

🔍

Query — Read

Client specifies exact fields needed. Traverses relationships in one request. Returns precisely what was asked for — nothing more.

✏️

Mutation — Write

Create, update, delete operations. Returns affected data, enabling optimistic UI updates. Equivalent to POST/PUT/DELETE in REST.

📡

Subscription — Real-time

Client subscribes to data changes. Server pushes updates via WebSocket when data changes. Use for live notifications and collaborative features.

💣

Query Complexity Attack

Deeply nested queries multiply database load exponentially. A single query can bring down an unprotected server. Always implement depth limits and complexity scoring before going public.

When NOT to Use GraphQL

Simple CRUD — REST is cleaner and easier to reason about
Internal services — gRPC is faster and more efficient
Small team — versioning pain not yet worth the learning curve
Performance-critical paths — query cost is hard to bound without persisted queries
No GraphQL expertise — N+1, caching, and complexity are non-obvious failure modes

GraphQL's flexibility is also its attack surface. Without query depth limits and complexity scoring, a single malicious or accidental query can exponentially multiply your database load. This is not theoretical — it has taken down production GraphQL APIs. Always implement query complexity analysis before going public. Also: GraphQL invalidates standard HTTP caching since all queries are POST to one endpoint — you need Apollo Client-side caching or persisted queries with CDN.

📋 Chapter 4 — Summary

REST problems GraphQL solves: over-fetching (too much data), under-fetching (multiple round-trips).
Client specifies exact fields needed — one request traverses the entire data graph.
Core operations: Query (read), Mutation (write), Subscription (real-time via WebSocket).
N+1 problem: 10 posts naively = 11 queries. DataLoader batches to 2 queries regardless of N.
Production concerns: query complexity attacks, HTTP caching broken, monitoring harder (one endpoint).
Use GraphQL when: complex data graph, multiple client types with different data needs, rapid frontend iteration.
Do not use when: simple CRUD, internal services, no GraphQL expertise, performance-critical paths.

Chapter Five

WebSockets & Server-Sent Events

Persistent Connections for Real-Time Data

HTTP was designed for documents — request something, receive it, connection closes. That model works for 99% of web interactions. But live chat, collaborative editing, real-time dashboards, and multiplayer games do not fit the request-response model. You need data to flow from server to client unprompted, or between clients in real time. Long polling was the first hack. Server-Sent Events was the clean HTTP solution for the server-to-client case. WebSockets were the right answer when both sides need to talk simultaneously.

Evolution of Real-Time on the Web — Long Polling vs SSE vs WebSocket

⏳

Long Polling

Direction: Client pulls (simulated push)
Connection: New HTTP connection per event
Server holds connection open until event fires
Wasteful — ties up threads and connections
Use only as: fallback when WebSocket blocked
Examples: legacy notification systems

📺

Server-Sent Events

Direction: Server to Client only
Connection: One persistent HTTP connection
Native browser support via EventSource API
Automatic reconnection built in
Works over HTTP/2 — multiplexed efficiently
Use for: live feeds, notifications, dashboards, progress
Examples: GitHub Actions logs, stock tickers

💬

WebSockets

Direction: Full duplex — both sides send
Connection: Upgraded persistent ws:// connection
Lower overhead after initial handshake
No automatic reconnection — must implement manually
Stateful — pins client to one server (scaling challenge)
Use for: chat, gaming, collaborative editing
Examples: Slack, Figma, online games

WebSocket Horizontal Scaling

WebSocket Scaling via Redis Pub/Sub

WebSocket servers are stateful — a connected client is pinned to a server. This breaks horizontal scaling. The standard solution is a shared pub/sub layer (Redis pub/sub) so all WebSocket servers can receive messages for any connected client. When a user on Server 1 sends a message to a user on Server 2, Server 1 publishes to Redis, Server 2 receives from its subscription, and delivers to its connected user. This is how Slack, Discord, and Socket.io handle millions of concurrent connections across hundreds of servers.

A complete guided design of a real-time chat system — WebSockets at scale with message persistence, delivery guarantees, and presence tracking — is covered as a full case study: Case Study: Chat System.

📋 Chapter 5 — Summary

Long polling: simulated push with repeated HTTP requests — wasteful, use only as fallback.
SSE: server pushes over HTTP, one-way, automatic reconnect, native browser EventSource API.
WebSocket: full-duplex persistent connection, both sides send, no automatic reconnection.
Use SSE for: live feeds, notifications, progress updates, dashboards — server-to-client only.
Use WebSocket for: chat, gaming, collaborative editing — anything requiring client-to-server real-time push.
WebSocket scaling challenge: stateful connections pin clients to servers — solve with Redis pub/sub fanout.

Chapter Six

Event-Driven Architecture

Systems That React Instead of Ask

In a synchronous system, service A asks service B for something and waits. Service A knows about service B. When service B slows down, service A slows down. Event-driven architecture inverts this. Service A emits an event — "an order was placed" — and does not know or care who reacts. Service B, C, and D each react independently. They can fail, be slow, or not exist yet — service A is unaffected. This decoupling is genuinely powerful. It is also a source of operational complexity that teams consistently underestimate until they are debugging a broken saga at 2am.

Three Types of Events

📢

Event Notification

Minimal payload — just the signal
Example: {"type":"order.placed","id":"123"}
Consumer fetches full data separately if needed
Pros: small, decoupled, privacy-safe
Cons: extra round-trip to fetch data
Use when: consumers vary in what data they need

📦

Event-Carried State Transfer

Full data payload included in the event
Consumer is self-contained — no extra requests
Pros: fast, no round-trip, consumer autonomous
Cons: large payload, sensitive data in stream
Use when: most consumers need full data

📜

Event Sourcing

Event log IS the source of truth
Current state derived by replaying events
Full audit trail, time-travel queries, replay for new services
Cons: complex, schema evolution is hard
Use only when: history itself is valuable business data
Financial ledgers, audit systems, legal records

Choreography vs Orchestration

Choreography

Orchestration

Services react autonomously to domain events
No central coordinator — nothing to become SPOF
Easy to add new behavior: subscribe new service
Hard to see the overall process state at a glance
Debugging requires distributed tracing across services
Use for: autonomous teams, microservices, loose coupling

Central process manager (Temporal, Step Functions) coordinates steps
Orchestrator tracks overall workflow state explicitly
Easy to handle exceptions, retries, compensation centrally
Orchestrator becomes a coupling point
Harder to evolve steps independently
Use for: complex workflows, regulated processes, SLAs

CQRS — Command Query Responsibility Segregation

Most systems use the same data model for reading and writing. This works until read and write patterns diverge enough that optimizing one hurts the other. CQRS separates them: a write model optimized for consistency and validation, and a read model optimized for query performance. The cost is eventual consistency — the read model is updated asynchronously from the write model.

Write Model (Commands)

Read Model (Queries)

Optimized for consistency and validation
Normalized — referential integrity enforced
Source of truth — authoritative state
Example: relational database with full ACID guarantees

Optimized for query performance — can be denormalized
Tailored to specific UI query patterns
Can be a separate store (Elasticsearch, Redis, read replica)
Updated asynchronously — eventually consistent with write model

When to Use CQRS

Use when read and write patterns are fundamentally different — heavy reads with complex joins alongside high-throughput writes with complex validation. Do NOT use for simple CRUD — the operational overhead and eventual consistency complexity is not worth it. CQRS is often combined with Event Sourcing but they are independent patterns; you can use either without the other.

Saga Pattern — Distributed Transactions

Distributed transactions across multiple services cannot use traditional database 2PC (two-phase commit) — it is impractical at service boundaries. The Saga pattern breaks a distributed transaction into a sequence of local transactions, each publishing an event to trigger the next step. If any step fails, compensating transactions run in reverse to undo completed steps.

🎵

Choreography Saga

Each service reacts to events and publishes the next
No central coordinator — services are autonomous
Loose coupling — easy to add new steps
Hard to see overall transaction state at a glance
Example: Order → Payment → Inventory → Shipping, each reacting to previous events

🎬

Orchestration Saga

Central saga orchestrator directs each step explicitly
Orchestrator tracks overall transaction state
Centralized compensation logic on failure
Orchestrator is a coupling point — SPOF risk
Tools: Temporal, AWS Step Functions, Conductor

The Outbox Pattern — Reliable Event Publishing

Dual-Write Problem vs Outbox Pattern Solution

Event sourcing is overused. It is a genuinely powerful pattern for systems where the history of state changes is itself valuable business data — financial ledgers, audit systems, legal records, compliance trails. For most systems it adds substantial complexity without adding value. If you cannot articulate why the event history is valuable beyond "we might want it later," use a regular database with an audit log column instead. The outbox pattern, by contrast, is underused — every service that publishes events should be using it.

Event Schema Evolution

Events are a public contract — consumers depend on their structure. Schema changes break consumers silently and are difficult to coordinate across teams. Use a schema registry (Confluent Schema Registry for Kafka) to enforce compatibility rules. Backward compatible changes: new fields must be optional with defaults. Forward compatible: new consumers must handle old events without new fields. Never remove fields — mark them deprecated. Test schema compatibility in CI before any event schema change reaches production.

The infrastructure that makes event-driven architecture work — Kafka partitioning, consumer groups, dead-letter queues, delivery guarantees — is covered in depth in Building Blocks: Message Queues & Streaming.

📋 Chapter 6 — Summary

Three event types: Notification (signal only), State Transfer (full payload), Event Sourcing (log is truth).
Choreography: autonomous services react to events — loose coupling, hard to trace overall flow.
Orchestration: central coordinator (Temporal, Step Functions) — visible state, coupling at coordinator.
CQRS: separate read and write models for independent scaling — write for consistency, read for performance.
Outbox pattern: write to DB + outbox table in one transaction; poller publishes to queue. Atomic, at-least-once delivery.
Saga pattern: distributed transactions via local steps with compensating actions on failure.
Event sourcing: only when event history itself is valuable business data — not as a default architecture.

Chapter Seven

API Design Best Practices

APIs as Long-Lived Public Contracts

An API is a promise. Unlike internal code which you can refactor whenever you want, an API has consumers you may not control. Change it carelessly and you break systems you did not write, maintained by teams you may not even know exist. The practices in this chapter exist because APIs live longer than the engineers who designed them. The decisions you make at version one — around idempotency, pagination, error format, deprecation — will either protect you or haunt you for years.

Idempotency — Safe Retries

🔑

Idempotency Key Pattern

Client generates a unique UUID per intended operation
Sends it as header: Idempotency-Key: uuid
Server checks if key seen before:
- First time: process request, store result with key
- Duplicate: return stored result — do not reprocess
Key expiry: typically 24 hours
Stripe uses this for all payment operations

🔄

Why It Matters

Network failures cause clients to retry
Without idempotency: retry = duplicate action
Payment retried = double charge
Email retried = duplicate email sent
GET, PUT, DELETE: inherently idempotent — safe to retry
POST: not idempotent by default — add Idempotency-Key

Pagination — Offset vs Cursor

Offset Pagination vs Cursor Pagination

Consistent Error Format

Every API error must return the same structure. Inconsistent errors force clients to write defensive parsing logic for every endpoint. One bad error format early in an API's life creates years of backward compatibility debt. Define the contract once and enforce it across every endpoint from day one.

📋

Standard Error Structure

HTTP status code: correct 4xx or 5xx — never 200
code: machine-readable constant (PAYMENT_DECLINED)
message: human-readable description for display
details: field-level issues for validation errors
request_id: unique ID for debugging and support tickets
documentation_url: link to error explanation for complex errors

⚠️

Common Error Format Mistakes

200 OK with "success": false in body — breaks HTTP caching and monitoring
403 Forbidden when 401 Unauthorized is correct (not authenticated vs not authorized)
400 Bad Request for everything instead of specific 409, 422, 429
200 for accepted async operations — use 202 Accepted
Different error shapes per endpoint — forces client defensive parsing
Exposing stack traces or internal paths in error messages

Standard Error Response

{
  "error": {
    "code": "PAYMENT_DECLINED",
    "message": "The card was declined by the issuer",
    "details": [{ "field": "card_number", "issue": "Invalid" }],
    "request_id": "req_abc123",
    "documentation_url": "https://api.example.com/docs/errors/PAYMENT_DECLINED"
  }
}

Rate Limiting & Authentication

🪣

Token Bucket

Bucket holds N tokens, refills at fixed rate
Each request consumes one token
Burst allowed — use tokens accumulated at idle
Best for: most public APIs — burst-friendly
Example: 100 tokens, refill 10/sec

🪟

Sliding Window

Rolling time window tracks request count
Smoother than fixed window — no boundary spikes
More memory: timestamp stored per request
Best for: smooth rate distribution
No 2× spike possible at window boundary

🕐

Fixed Window

Count requests per period (minute or hour)
Simplest to implement and explain
Risk: 2× rate possible at window boundary
Best for: loose rate limiting, internal APIs
Return: X-RateLimit-Remaining, Retry-After

Standard Rate Limit Response Headers

Include these on every response so clients can implement respectful retry logic without guessing:

X-RateLimit-Limit: 1000        # requests allowed per window
X-RateLimit-Remaining: 743     # requests remaining in current window
X-RateLimit-Reset: 1703721600  # Unix timestamp when window resets
Retry-After: 30                # seconds to wait (on 429 response only)

🗝️

API Keys

Per-client secret token issued at registration
Simple to issue, revoke, and rotate
No expiry by default — rotate regularly
Use for: server-to-server, non-user-specific

🎫

JWT

Self-contained signed token with claims
Stateless — no server lookup required
Short-lived: 15 min to 1 hour typically
Use for: user sessions, microservice auth

🔐

OAuth 2.0

Delegated authorization standard
Third-party access with explicit scopes
More complex but more powerful
Use for: external integrations, social login

Deprecation Standard

Never remove an endpoint without warning. Add Sunset header (RFC 8594) to deprecated endpoint responses: Sunset: Sat, 31 Dec 2025 23:59:59 GMT. Provide minimum 6 months notice. Monitor who is still calling the deprecated endpoint. Contact remaining consumers before removal. Remove only after usage reaches zero or the deadline has passed. Additive changes — new fields, new optional parameters — require no versioning at all.

An API is a promise. The moment you have an external consumer, every breaking change is a support burden, an outage risk, and a trust violation. Design your API as if it will live for 10 years — because the good ones do. Pagination strategy, error format, and idempotency keys are not implementation details you can retrofit. They are the shape of the contract you are signing with every consumer on day one.

Authentication and authorization in depth — JWT internals, OAuth 2.0 flows, token storage security, and zero-trust models — is covered in Security & Observability: Authentication & Authorization.

📋 Chapter 7 — Summary

Idempotency key: client-generated UUID, server stores result, return stored on duplicate — critical for safe POST retries.
Offset pagination: simple but expensive at depth and unstable when data changes during pagination.
Cursor pagination: stable, efficient at any depth, no arbitrary page jumps — use for feeds and large datasets.
Rate limiting: Token Bucket for burst-friendly, Sliding Window for smooth, Fixed Window for simple.
Authentication: API Keys (server-to-server), JWT (user sessions, stateless), OAuth 2.0 (delegated, scoped).
Deprecation: Sunset header (RFC 8594), 6-month minimum notice, monitor usage, never remove until consumers migrate.
Error format: always consistent — HTTP status code, machine-readable code, human message, request ID, documentation URL.

Communication & APIs at a Glance

01 · Sync vs Async

Coupling Is the Cost of Convenience

Sync: caller blocks, both must be available simultaneously
Async: caller continues, temporally decoupled, failure isolated
Three patterns: Request-Response, Fire-and-Forget, Pub-Sub
Every sync call accepts the dependency's full failure rate

02 · REST

Architectural Style, Not a Protocol

Six constraints — Stateless is the one that enables scaling
GET/PUT/DELETE idempotent — POST is not, add Idempotency-Key
Most APIs are Richardson Level 2 — sufficient for production
Versioning: URL most common, header most architecturally clean

03 · gRPC

Internal Services, Not Public APIs

Protobuf: binary, 3–10× smaller than JSON, strongly typed
HTTP/2 multiplexing: multiple streams on one connection
Four modes: Unary, Server streaming, Client streaming, Bidirectional
No native browser support — requires L7 gRPC-aware load balancer

04 · GraphQL

Client-Specified Queries — With Real Costs

Solves over-fetching and under-fetching from fixed REST endpoints
N+1 problem: DataLoader batches 11 queries into 2 regardless of N
Query complexity attacks: always implement depth limits before launch
HTTP caching broken — all queries POST to one endpoint

05 · WebSockets & SSE

Real-Time Without Polling

SSE: server-to-client only, auto-reconnect, native browser support
WebSocket: full duplex, no auto-reconnect, stateful connections
WebSocket scaling: Redis pub/sub bridges servers for different users
Long polling: fallback only — new connection per event is wasteful

06 · Event-Driven

React Instead of Ask

Three event types: Notification, State Transfer, Event Sourcing
Outbox pattern: atomic DB write + event publish via transaction
Choreography: loose coupling, hard to trace. Orchestration: visible, coupled
Event sourcing: only when history is valuable business data

07 · API Best Practices

APIs Live Longer Than Their Designers

Idempotency-Key: store result, return stored on duplicate POST
Cursor over offset: stable, efficient at any depth
Token Bucket for burst traffic, Sliding Window for smoothness
Sunset header (RFC 8594): 6 months minimum before removal

← Scalability & Reliability Data at Scale →