Communication & APIs
How components talk to each other and to the outside world.
Synchronous vs Asynchronous Communication
A service that calls another service synchronously is only as reliable as that dependency. If the payment service times out, the checkout service times out. If the email service is slow, the registration endpoint is slow. Synchronous coupling is invisible until a dependency fails β and then it is catastrophically visible. Asynchronous communication breaks that coupling at the cost of complexity. The question every system designer must answer is deceptively simple: does the caller need the result right now, or just eventually?
- Caller blocks waiting for response
- Both parties must be available simultaneously
- Failure in dependency propagates to caller immediately
- Total latency = sum of all dependency latencies
- Simple to reason about β linear call trace
- Thread/connection held for full duration of call
- Use when: result needed to continue processing
- Caller fires message and continues immediately
- Temporal decoupling β parties need not be available together
- Failure in dependency: message queued, not lost
- Caller latency decoupled from dependency latency
- Complex to debug β requires distributed traces
- Requires idempotent message handlers
- Use when: result not needed immediately by caller
Every distributed system interaction fits one of three patterns. Understanding which pattern you are using β and which you should be using β is the first step to designing reliable inter-service communication. Most teams default to request-response for everything, then retrofit async when latency problems appear. Designing for the right pattern from the start is cheaper.
Request-Response
- Pattern: Synchronous, one-to-one
- Client sends request, waits, receives response
- Direct feedback on success or failure
- Use when: result needed to continue
- Examples: REST API, gRPC, SQL query
Fire-and-Forget
- Pattern: Async, one-way, no response
- Client sends message and does not wait
- No guarantee of delivery without acknowledgment
- Use when: result not needed
- Examples: logging, metrics, UDP
Publish-Subscribe
- Pattern: Async, one-to-many
- Publisher emits event, N subscribers react
- Publisher does not know who receives
- Use when: fan-out, event-driven systems
- Examples: Kafka, SNS, Redis Pub/Sub
Synchronous Is Wrong Whenβ¦
- Work takes longer than acceptable response time (>200ms)
- Dependency is unreliable or rate-limited by a third party
- Result is not needed by the caller to continue
- Fan-out to multiple downstream services is required
- You need resilience against dependency failures
Asynchronous Is Wrong Whenβ¦
- Caller needs the result to continue β payment confirmation
- Error must be returned immediately to the user
- Strict ordering guarantees across services are required
- System is simple enough that async adds complexity without value
- Team lacks experience debugging distributed async flows
The rule is simple: if the caller needs the result to continue, use synchronous. If it does not, use asynchronous. Every synchronous call to an unreliable service is a latency and availability risk you have accepted. Make that choice deliberately β not by default.
The infrastructure for async communication β Kafka, RabbitMQ, SQS β including delivery guarantees, consumer groups, and dead-letter queues β is covered in depth in Building Blocks: Message Queues & Streaming.
- Synchronous: caller blocks, both must be available simultaneously, failure propagates immediately.
- Asynchronous: caller continues, temporally decoupled, failure isolated β message queued not lost.
- Three patterns: Request-Response (sync, one-to-one), Fire-and-Forget (async, one-way), Publish-Subscribe (async, one-to-many).
- Use sync when: result needed to continue processing (payment confirmation, auth check).
- Use async when: result not needed immediately, fan-out required, dependency is unreliable.
- Every synchronous call accepts the dependency's full latency and failure rate β that is not free.
REST β Deep Dive
REST is so ubiquitous that most engineers use it without knowing what it actually is. It is not a protocol. It is not a standard. REST is an architectural style β a set of constraints defined by Roy Fielding in his 2000 dissertation. Most APIs called "REST" violate at least two of its six constraints. Understanding what REST actually requires β and where most implementations deviate β is the difference between an API that ages well and one that requires constant versioning pain.
Client-Server
UI concerns separated from data storage concerns. Frontend and backend evolve independently. The separation is the stability.
Stateless
Every request contains all information needed to process it. Server holds no session state between requests. Enables horizontal scaling.
Cacheable
Responses must declare themselves cacheable or not via Cache-Control headers. Enables CDN and browser caching without client logic.
Uniform Interface
Resources identified by URIs. Manipulation through representations. Self-descriptive messages. HATEOAS. The most violated constraint in practice.
Layered System
Client cannot tell if it is talking to origin server or intermediary (CDN, load balancer, cache). Transparency enables infrastructure flexibility.
Code on Demand
Optional: server can extend client functionality by delivering executable scripts. JavaScript delivery is the only common use of this constraint.
Idempotency is the property that matters most for reliability. If an operation is idempotent, clients can safely retry it after a network failure without producing duplicates. GET, PUT, and DELETE are inherently idempotent. POST is not β and that distinction drives how you design retry logic, payment APIs, and everything else that must not create duplicates.
Status Code Families
- 2xx Success: 200 OK, 201 Created, 202 Accepted (async), 204 No Content
- 3xx Redirect: 301 Permanent, 302 Temporary, 304 Not Modified (cache valid)
- 4xx Client Error: 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable, 429 Too Many Requests
- 5xx Server Error: 500 Internal, 502 Bad Gateway, 503 Unavailable, 504 Timeout
Common Status Code Mistakes
- 200 with error body β breaks HTTP monitoring, CDN caching, and client error handling
- 403 instead of 401 β 401 = not authenticated, 403 = authenticated but not authorized
- 400 for everything β use 409 Conflict for duplicate, 422 for validation, 429 for rate limit
- 200 for async operations β use 202 Accepted when result is not yet ready
- 500 for client errors β 5xx signals server bug, 4xx signals client error
URL Versioning
/v1/users,/v2/users- Obvious, easy to route and test
- Easy to document separately per version
- Purists say: version is not a resource property
- Used by: most public APIs in practice
Header Versioning
API-Version: 2024-01-15- Clean URLs, stable resource identity
- Harder to test β cannot just paste URL in browser
- Less visible to consumers not reading docs
- Used by: Stripe, GitHub
Content Negotiation
Accept: application/vnd.api+json;v=2- Most RESTful β standard HTTP mechanism
- Complex to implement and document
- Consumers rarely understand it
- Used by: almost nobody
GET, PUT, DELETE are idempotent β safe to retry on network failure. POST is not. For safe POST retries, require an Idempotency-Key header: client generates a UUID, server stores key + result, subsequent requests with the same key return the stored result without reprocessing. Stripe does this for all payment operations β it is why double charges are rare.
Most APIs called REST are actually Level 2 of the Richardson Maturity Model β they use HTTP methods correctly on resource endpoints. That is sufficient for production. Do not let REST purity distract from building an API that is consistent, predictable, and does not break clients when it evolves. Additive changes β new fields, new endpoints, new optional parameters β are non-breaking. Everything else requires a version.
REST at the infrastructure layer β API gateways, rate limiting, authentication, and routing β is covered in Building Blocks: API Gateway & Proxies.
- REST is an architectural style, not a protocol β six constraints, most implementations satisfy only a subset.
- Stateless is the constraint that enables horizontal scaling β no session state on servers.
- HTTP methods: GET/PUT/DELETE are idempotent and safe to retry. POST is not β use Idempotency-Key header.
- Status codes: use specific ones β never 200 with an error body, never 403 when 401 is correct.
- Richardson Level 2 (HTTP verbs + resources) is sufficient for most production APIs.
- Versioning: URL versioning is most common in practice; header versioning (Stripe, GitHub) is cleanest architecturally.
- Never break existing clients β additive changes only. Breaking changes require a new version with advance notice.
gRPC & Protocol Buffers
REST on HTTP/1.1 with JSON is convenient β every developer knows it, every tool supports it, every debugger can read it. But convenience has a cost: JSON is verbose, HTTP/1.1 is request-response only, and text parsing is slower than binary. When you are making millions of service-to-service calls per second inside a datacenter, those costs compound. gRPC was built by Google for exactly this problem β high-throughput, low-latency, strongly-typed communication between internal services. It is not a replacement for REST. It is the right tool for a specific context.
What Protobuf Is
- Binary serialization format β not human-readable
- Schema defined in
.protofiles β code generated from schema - Strongly typed β no runtime type surprises
- Supported languages: Go, Java, Python, C++, Node.js, Ruby, and more
- Size: typically 3β10Γ smaller than equivalent JSON
- Speed: 5β10Γ faster to parse than JSON
Schema Evolution Rules
- Fields identified by field numbers, not names
- Adding new fields: backward compatible β old code ignores unknown fields
- Removing fields: mark deprecated, never reuse the number
- Changing field type: breaking change β avoid entirely
- Renaming a field: safe β wire format uses numbers, not names
- Field numbers are the permanent contract β choose carefully
Unary RPC
- Client sends one request, server returns one response
- Same pattern as a REST call β simplest to reason about
- Use: most request-response interactions
GetUser(UserRequest) returns User
Server Streaming RPC
- Client sends one request, server returns stream
- Client reads until stream ends
- Use: large dataset download, live price feed, log tailing
WatchOrders() returns (stream OrderUpdate)
Client Streaming RPC
- Client sends stream of requests, server returns one response
- Server reads all messages then responds once
- Use: bulk upload, batch sensor data ingestion
UploadData(stream Chunk) returns Summary
Bidirectional Streaming
- Both sides send streams simultaneously
- Each side reads independently β fully concurrent
- Use: real-time chat, collaborative editing, game state sync
Chat() returns (stream Message)β both directions
- Human-readable JSON β easy to debug
- Any client without special tooling or codegen
- HTTP/1.1 β one request per connection
- Larger payload β verbose text encoding
- No built-in streaming (SSE is one-way only)
- Native browser support β paste URL in browser
- Best for: public APIs, external clients, browsers
- Binary Protobuf β not human-readable, harder to debug
- Requires generated client stubs from .proto schema
- HTTP/2 β multiplexed, compressed, low overhead
- Compact payload β 3β10Γ smaller than JSON
- 4 streaming modes built into the protocol
- No native browser support (grpc-web proxy required)
- Best for: internal service-to-service, high QPS, streaming
gRPC is not a replacement for REST β it is the right tool for internal service-to-service communication. The moment you need external clients or browser support, REST is still the correct choice. gRPC's lack of native browser support is not a limitation that will be fixed β it is a fundamental consequence of the HTTP/2 framing browser APIs do not expose. Use gRPC where you control both client and server.
gRPC in a service mesh β where Istio or Linkerd handle mTLS, retries, and load balancing for gRPC traffic β is covered in Architecture Styles: Service Mesh. Note: standard L4 load balancers do not distribute HTTP/2 streams correctly β you need an L7 gRPC-aware load balancer.
- Protobuf: binary, strongly typed, 3β10Γ smaller than JSON, 5β10Γ faster to parse.
- Schema evolution: add fields freely, never reuse field numbers, rename safely, changing type is breaking.
- HTTP/2 multiplexing: multiple streams on one connection β no head-of-line blocking between requests.
- Four streaming types: Unary (1:1), Server Streaming (1:N), Client Streaming (N:1), Bidirectional (N:N).
- Use gRPC for: internal services, high QPS, streaming, strongly-typed contracts across teams.
- Use REST for: public APIs, browser clients, external consumers, when debuggability matters.
- gRPC limitation: no native browser support, requires L7 load balancer aware of HTTP/2 streams.
GraphQL
REST was designed with servers deciding what data to return. GraphQL flips that relationship β clients specify exactly what they need and the server returns exactly that, no more and no less. This solves real problems: mobile clients that cannot afford to fetch large payloads they only partially use, frontend teams blocked waiting for backend teams to add fields to an endpoint, multiple round-trips to assemble a single view from separate resources. GraphQL is not always the answer. Its flexibility comes with costs that most teams underestimate until they are already in production.
GET /users/123returns all 47 fields- Client needs 3 fields β 44 are wasted bandwidth
- Related posts require a separate request
- Post comments require yet another request
- 3 round-trips to assemble one view
- Backend team must add new endpoint for new data need
- Client queries exactly:
name, avatar, posts{title} - Response contains only those fields β nothing extra
- Relationships traversed in one request
- One round-trip for the same view
- Mobile client saves bandwidth and battery
- Frontend can iterate without waiting on backend
Query β Read
Client specifies exact fields needed. Traverses relationships in one request. Returns precisely what was asked for β nothing more.
Mutation β Write
Create, update, delete operations. Returns affected data, enabling optimistic UI updates. Equivalent to POST/PUT/DELETE in REST.
Subscription β Real-time
Client subscribes to data changes. Server pushes updates via WebSocket when data changes. Use for live notifications and collaborative features.
Query Complexity Attack
Deeply nested queries multiply database load exponentially. A single query can bring down an unprotected server. Always implement depth limits and complexity scoring before going public.
- Simple CRUD β REST is cleaner and easier to reason about
- Internal services β gRPC is faster and more efficient
- Small team β versioning pain not yet worth the learning curve
- Performance-critical paths β query cost is hard to bound without persisted queries
- No GraphQL expertise β N+1, caching, and complexity are non-obvious failure modes
GraphQL's flexibility is also its attack surface. Without query depth limits and complexity scoring, a single malicious or accidental query can exponentially multiply your database load. This is not theoretical β it has taken down production GraphQL APIs. Always implement query complexity analysis before going public. Also: GraphQL invalidates standard HTTP caching since all queries are POST to one endpoint β you need Apollo Client-side caching or persisted queries with CDN.
- REST problems GraphQL solves: over-fetching (too much data), under-fetching (multiple round-trips).
- Client specifies exact fields needed β one request traverses the entire data graph.
- Core operations: Query (read), Mutation (write), Subscription (real-time via WebSocket).
- N+1 problem: 10 posts naively = 11 queries. DataLoader batches to 2 queries regardless of N.
- Production concerns: query complexity attacks, HTTP caching broken, monitoring harder (one endpoint).
- Use GraphQL when: complex data graph, multiple client types with different data needs, rapid frontend iteration.
- Do not use when: simple CRUD, internal services, no GraphQL expertise, performance-critical paths.
WebSockets & Server-Sent Events
HTTP was designed for documents β request something, receive it, connection closes. That model works for 99% of web interactions. But live chat, collaborative editing, real-time dashboards, and multiplayer games do not fit the request-response model. You need data to flow from server to client unprompted, or between clients in real time. Long polling was the first hack. Server-Sent Events was the clean HTTP solution for the server-to-client case. WebSockets were the right answer when both sides need to talk simultaneously.
Long Polling
- Direction: Client pulls (simulated push)
- Connection: New HTTP connection per event
- Server holds connection open until event fires
- Wasteful β ties up threads and connections
- Use only as: fallback when WebSocket blocked
- Examples: legacy notification systems
Server-Sent Events
- Direction: Server to Client only
- Connection: One persistent HTTP connection
- Native browser support via EventSource API
- Automatic reconnection built in
- Works over HTTP/2 β multiplexed efficiently
- Use for: live feeds, notifications, dashboards, progress
- Examples: GitHub Actions logs, stock tickers
WebSockets
- Direction: Full duplex β both sides send
- Connection: Upgraded persistent ws:// connection
- Lower overhead after initial handshake
- No automatic reconnection β must implement manually
- Stateful β pins client to one server (scaling challenge)
- Use for: chat, gaming, collaborative editing
- Examples: Slack, Figma, online games
WebSocket servers are stateful β a connected client is pinned to a server. This breaks horizontal scaling. The standard solution is a shared pub/sub layer (Redis pub/sub) so all WebSocket servers can receive messages for any connected client. When a user on Server 1 sends a message to a user on Server 2, Server 1 publishes to Redis, Server 2 receives from its subscription, and delivers to its connected user. This is how Slack, Discord, and Socket.io handle millions of concurrent connections across hundreds of servers.
A complete guided design of a real-time chat system β WebSockets at scale with message persistence, delivery guarantees, and presence tracking β is covered as a full case study: Case Study: Chat System.
- Long polling: simulated push with repeated HTTP requests β wasteful, use only as fallback.
- SSE: server pushes over HTTP, one-way, automatic reconnect, native browser EventSource API.
- WebSocket: full-duplex persistent connection, both sides send, no automatic reconnection.
- Use SSE for: live feeds, notifications, progress updates, dashboards β server-to-client only.
- Use WebSocket for: chat, gaming, collaborative editing β anything requiring client-to-server real-time push.
- WebSocket scaling challenge: stateful connections pin clients to servers β solve with Redis pub/sub fanout.
Event-Driven Architecture
In a synchronous system, service A asks service B for something and waits. Service A knows about service B. When service B slows down, service A slows down. Event-driven architecture inverts this. Service A emits an event β "an order was placed" β and does not know or care who reacts. Service B, C, and D each react independently. They can fail, be slow, or not exist yet β service A is unaffected. This decoupling is genuinely powerful. It is also a source of operational complexity that teams consistently underestimate until they are debugging a broken saga at 2am.
Event Notification
- Minimal payload β just the signal
- Example:
{"type":"order.placed","id":"123"} - Consumer fetches full data separately if needed
- Pros: small, decoupled, privacy-safe
- Cons: extra round-trip to fetch data
- Use when: consumers vary in what data they need
Event-Carried State Transfer
- Full data payload included in the event
- Consumer is self-contained β no extra requests
- Pros: fast, no round-trip, consumer autonomous
- Cons: large payload, sensitive data in stream
- Use when: most consumers need full data
Event Sourcing
- Event log IS the source of truth
- Current state derived by replaying events
- Full audit trail, time-travel queries, replay for new services
- Cons: complex, schema evolution is hard
- Use only when: history itself is valuable business data
- Financial ledgers, audit systems, legal records
- Services react autonomously to domain events
- No central coordinator β nothing to become SPOF
- Easy to add new behavior: subscribe new service
- Hard to see the overall process state at a glance
- Debugging requires distributed tracing across services
- Use for: autonomous teams, microservices, loose coupling
- Central process manager (Temporal, Step Functions) coordinates steps
- Orchestrator tracks overall workflow state explicitly
- Easy to handle exceptions, retries, compensation centrally
- Orchestrator becomes a coupling point
- Harder to evolve steps independently
- Use for: complex workflows, regulated processes, SLAs
Most systems use the same data model for reading and writing. This works until read and write patterns diverge enough that optimizing one hurts the other. CQRS separates them: a write model optimized for consistency and validation, and a read model optimized for query performance. The cost is eventual consistency β the read model is updated asynchronously from the write model.
- Optimized for consistency and validation
- Normalized β referential integrity enforced
- Source of truth β authoritative state
- Example: relational database with full ACID guarantees
- Optimized for query performance β can be denormalized
- Tailored to specific UI query patterns
- Can be a separate store (Elasticsearch, Redis, read replica)
- Updated asynchronously β eventually consistent with write model
Use when read and write patterns are fundamentally different β heavy reads with complex joins alongside high-throughput writes with complex validation. Do NOT use for simple CRUD β the operational overhead and eventual consistency complexity is not worth it. CQRS is often combined with Event Sourcing but they are independent patterns; you can use either without the other.
Distributed transactions across multiple services cannot use traditional database 2PC (two-phase commit) β it is impractical at service boundaries. The Saga pattern breaks a distributed transaction into a sequence of local transactions, each publishing an event to trigger the next step. If any step fails, compensating transactions run in reverse to undo completed steps.
Choreography Saga
- Each service reacts to events and publishes the next
- No central coordinator β services are autonomous
- Loose coupling β easy to add new steps
- Hard to see overall transaction state at a glance
- Example: Order β Payment β Inventory β Shipping, each reacting to previous events
Orchestration Saga
- Central saga orchestrator directs each step explicitly
- Orchestrator tracks overall transaction state
- Centralized compensation logic on failure
- Orchestrator is a coupling point β SPOF risk
- Tools: Temporal, AWS Step Functions, Conductor
Event sourcing is overused. It is a genuinely powerful pattern for systems where the history of state changes is itself valuable business data β financial ledgers, audit systems, legal records, compliance trails. For most systems it adds substantial complexity without adding value. If you cannot articulate why the event history is valuable beyond "we might want it later," use a regular database with an audit log column instead. The outbox pattern, by contrast, is underused β every service that publishes events should be using it.
Events are a public contract β consumers depend on their structure. Schema changes break consumers silently and are difficult to coordinate across teams. Use a schema registry (Confluent Schema Registry for Kafka) to enforce compatibility rules. Backward compatible changes: new fields must be optional with defaults. Forward compatible: new consumers must handle old events without new fields. Never remove fields β mark them deprecated. Test schema compatibility in CI before any event schema change reaches production.
The infrastructure that makes event-driven architecture work β Kafka partitioning, consumer groups, dead-letter queues, delivery guarantees β is covered in depth in Building Blocks: Message Queues & Streaming.
- Three event types: Notification (signal only), State Transfer (full payload), Event Sourcing (log is truth).
- Choreography: autonomous services react to events β loose coupling, hard to trace overall flow.
- Orchestration: central coordinator (Temporal, Step Functions) β visible state, coupling at coordinator.
- CQRS: separate read and write models for independent scaling β write for consistency, read for performance.
- Outbox pattern: write to DB + outbox table in one transaction; poller publishes to queue. Atomic, at-least-once delivery.
- Saga pattern: distributed transactions via local steps with compensating actions on failure.
- Event sourcing: only when event history itself is valuable business data β not as a default architecture.
API Design Best Practices
An API is a promise. Unlike internal code which you can refactor whenever you want, an API has consumers you may not control. Change it carelessly and you break systems you did not write, maintained by teams you may not even know exist. The practices in this chapter exist because APIs live longer than the engineers who designed them. The decisions you make at version one β around idempotency, pagination, error format, deprecation β will either protect you or haunt you for years.
Idempotency Key Pattern
- Client generates a unique UUID per intended operation
- Sends it as header:
Idempotency-Key: uuid - Server checks if key seen before:
- First time: process request, store result with key
- Duplicate: return stored result β do not reprocess
- Key expiry: typically 24 hours
- Stripe uses this for all payment operations
Why It Matters
- Network failures cause clients to retry
- Without idempotency: retry = duplicate action
- Payment retried = double charge
- Email retried = duplicate email sent
- GET, PUT, DELETE: inherently idempotent β safe to retry
- POST: not idempotent by default β add Idempotency-Key
Every API error must return the same structure. Inconsistent errors force clients to write defensive parsing logic for every endpoint. One bad error format early in an API's life creates years of backward compatibility debt. Define the contract once and enforce it across every endpoint from day one.
Standard Error Structure
- HTTP status code: correct 4xx or 5xx β never 200
- code: machine-readable constant (
PAYMENT_DECLINED) - message: human-readable description for display
- details: field-level issues for validation errors
- request_id: unique ID for debugging and support tickets
- documentation_url: link to error explanation for complex errors
Common Error Format Mistakes
- 200 OK with
"success": falsein body β breaks HTTP caching and monitoring - 403 Forbidden when 401 Unauthorized is correct (not authenticated vs not authorized)
- 400 Bad Request for everything instead of specific 409, 422, 429
- 200 for accepted async operations β use 202 Accepted
- Different error shapes per endpoint β forces client defensive parsing
- Exposing stack traces or internal paths in error messages
{
"error": {
"code": "PAYMENT_DECLINED",
"message": "The card was declined by the issuer",
"details": [{ "field": "card_number", "issue": "Invalid" }],
"request_id": "req_abc123",
"documentation_url": "https://api.example.com/docs/errors/PAYMENT_DECLINED"
}
} Token Bucket
- Bucket holds N tokens, refills at fixed rate
- Each request consumes one token
- Burst allowed β use tokens accumulated at idle
- Best for: most public APIs β burst-friendly
- Example: 100 tokens, refill 10/sec
Sliding Window
- Rolling time window tracks request count
- Smoother than fixed window β no boundary spikes
- More memory: timestamp stored per request
- Best for: smooth rate distribution
- No 2Γ spike possible at window boundary
Fixed Window
- Count requests per period (minute or hour)
- Simplest to implement and explain
- Risk: 2Γ rate possible at window boundary
- Best for: loose rate limiting, internal APIs
- Return:
X-RateLimit-Remaining,Retry-After
Include these on every response so clients can implement respectful retry logic without guessing:
X-RateLimit-Limit: 1000 # requests allowed per window X-RateLimit-Remaining: 743 # requests remaining in current window X-RateLimit-Reset: 1703721600 # Unix timestamp when window resets Retry-After: 30 # seconds to wait (on 429 response only)
API Keys
- Per-client secret token issued at registration
- Simple to issue, revoke, and rotate
- No expiry by default β rotate regularly
- Use for: server-to-server, non-user-specific
JWT
- Self-contained signed token with claims
- Stateless β no server lookup required
- Short-lived: 15 min to 1 hour typically
- Use for: user sessions, microservice auth
OAuth 2.0
- Delegated authorization standard
- Third-party access with explicit scopes
- More complex but more powerful
- Use for: external integrations, social login
Never remove an endpoint without warning. Add Sunset header (RFC 8594) to deprecated endpoint responses: Sunset: Sat, 31 Dec 2025 23:59:59 GMT. Provide minimum 6 months notice. Monitor who is still calling the deprecated endpoint. Contact remaining consumers before removal. Remove only after usage reaches zero or the deadline has passed. Additive changes β new fields, new optional parameters β require no versioning at all.
An API is a promise. The moment you have an external consumer, every breaking change is a support burden, an outage risk, and a trust violation. Design your API as if it will live for 10 years β because the good ones do. Pagination strategy, error format, and idempotency keys are not implementation details you can retrofit. They are the shape of the contract you are signing with every consumer on day one.
Authentication and authorization in depth β JWT internals, OAuth 2.0 flows, token storage security, and zero-trust models β is covered in Security & Observability: Authentication & Authorization.
- Idempotency key: client-generated UUID, server stores result, return stored on duplicate β critical for safe POST retries.
- Offset pagination: simple but expensive at depth and unstable when data changes during pagination.
- Cursor pagination: stable, efficient at any depth, no arbitrary page jumps β use for feeds and large datasets.
- Rate limiting: Token Bucket for burst-friendly, Sliding Window for smooth, Fixed Window for simple.
- Authentication: API Keys (server-to-server), JWT (user sessions, stateless), OAuth 2.0 (delegated, scoped).
- Deprecation: Sunset header (RFC 8594), 6-month minimum notice, monitor usage, never remove until consumers migrate.
- Error format: always consistent β HTTP status code, machine-readable code, human message, request ID, documentation URL.
Coupling Is the Cost of Convenience
- Sync: caller blocks, both must be available simultaneously
- Async: caller continues, temporally decoupled, failure isolated
- Three patterns: Request-Response, Fire-and-Forget, Pub-Sub
- Every sync call accepts the dependency's full failure rate
Architectural Style, Not a Protocol
- Six constraints β Stateless is the one that enables scaling
- GET/PUT/DELETE idempotent β POST is not, add Idempotency-Key
- Most APIs are Richardson Level 2 β sufficient for production
- Versioning: URL most common, header most architecturally clean
Internal Services, Not Public APIs
- Protobuf: binary, 3β10Γ smaller than JSON, strongly typed
- HTTP/2 multiplexing: multiple streams on one connection
- Four modes: Unary, Server streaming, Client streaming, Bidirectional
- No native browser support β requires L7 gRPC-aware load balancer
Client-Specified Queries β With Real Costs
- Solves over-fetching and under-fetching from fixed REST endpoints
- N+1 problem: DataLoader batches 11 queries into 2 regardless of N
- Query complexity attacks: always implement depth limits before launch
- HTTP caching broken β all queries POST to one endpoint
Real-Time Without Polling
- SSE: server-to-client only, auto-reconnect, native browser support
- WebSocket: full duplex, no auto-reconnect, stateful connections
- WebSocket scaling: Redis pub/sub bridges servers for different users
- Long polling: fallback only β new connection per event is wasteful
React Instead of Ask
- Three event types: Notification, State Transfer, Event Sourcing
- Outbox pattern: atomic DB write + event publish via transaction
- Choreography: loose coupling, hard to trace. Orchestration: visible, coupled
- Event sourcing: only when history is valuable business data
APIs Live Longer Than Their Designers
- Idempotency-Key: store result, return stored on duplicate POST
- Cursor over offset: stable, efficient at any depth
- Token Bucket for burst traffic, Sliding Window for smoothness
- Sunset header (RFC 8594): 6 months minimum before removal