API Gateway & Proxies
The entry points that manage, secure, and route API traffic.
What Proxies and API Gateways Are
A proxy is exactly what its English meaning suggests — something that stands in for, and acts on behalf of, something else. In networking it is a server that sits between two endpoints and inspects, modifies, or relays traffic between them. The same physical mechanism is used for two completely different purposes, and the words for them get mixed up constantly. A forward proxy stands in for the client — corporate users go through one to reach the public internet, the proxy hiding their identity and applying access policies. A reverse proxy stands in for the server — the public internet sees the proxy, and your real backend services hide behind it. The same code can do both jobs, but the use cases are mirror opposites.
An API gateway is a reverse proxy with a job description: it sits at the edge of your backend, exposes a single, well-known address to clients, and handles the cross-cutting concerns that every API needs — authentication, rate limiting, routing, request transformation, observability. It is not a new category of software; it is what happens when a reverse proxy grows up, learns about your business, and starts making routing decisions based on the URL path, the auth token, and the time of day rather than just the destination IP.
Forward Proxy — Stands In For Clients
Direction: client → proxy → many external servers.
Strength: outbound access control, content filtering, anonymisation, caching outbound responses.
Examples: Squid in a corporate network, a VPN, Tor.
Used for: “our employees go through this to reach the internet.”
Reverse Proxy — Stands In For Servers
Direction: many external clients → proxy → internal servers.
Strength: TLS termination, load balancing, caching, security shielding, single edge address.
Examples: Nginx, HAProxy, Envoy, Caddy in front of services.
Used for: “our public users come through this to reach our backends.”
The analogy: a forward proxy is the receptionist at a corporate building who screens employees' outgoing calls and packages. A reverse proxy is the receptionist at a public-facing office who screens visitors before they reach the right person inside. An API gateway is that public receptionist, except they also know your appointment, take your ID, decide which department to send you to, log everything you said, and politely turn you away if you've been visiting too often today.
If your service is reachable from the public internet, something acts as a reverse proxy in front of it — intentionally chosen, or accidental (the cloud load balancer doing it for you). Choose deliberately. The reverse proxy is the most leveraged piece of code in your edge: every request, every response, every TLS handshake, every WAF rule.
- Forward proxy: stands in for clients reaching out — corporate egress, content filtering, anonymisation.
- Reverse proxy: stands in for servers being reached — TLS, load balancing, security, single edge address.
- API gateway: a reverse proxy that understands your APIs — auth, rate limits, routing, transformations.
- Every internet-facing service has a reverse proxy in front of it; the only choice is whether you picked it deliberately.
How API Gateways Work Internally
Every API gateway, no matter the vendor, implements roughly the same pipeline. A request enters at the edge and is processed by a series of plugins or filters, each handling one cross-cutting concern. The pipeline order matters: authentication must come before rate limiting (otherwise an unauthenticated flood can exhaust your rate-limit budget), and rate limiting must come before routing (otherwise abusive traffic still reaches your backends). Most outages I have debugged at the edge layer trace back to a misordering or a missing stage.
Routing is the single most important job an API gateway does. A modern gateway can route on path (/api/v1/orders → order-service), method (GET vs POST), header (X-Tenant-ID, Accept-Version), query string, source IP, geographic region, and percentage rollout (canary deployments). The trick is keeping the routing rules simple enough to reason about. A gateway with 400 route rules and dynamic dispatch logic becomes a hidden monolith. Routes should be declarative, version-controlled, and reviewed.
Authentication & Authorisation
What: verify the caller is who they say (AuthN) and is allowed to do this (AuthZ).
How: JWT validation, OAuth introspection, API keys, mTLS for service-to-service.
Why at gateway: centralise the policy; backends only see authenticated requests with verified identity in headers.
Rate Limiting
What: cap requests per key (user, API key, IP) per time window.
How: token bucket or sliding window in shared store (Redis).
Why at gateway: protect backends from runaway clients; meter paid tiers; absorb DDoS bursts.
Service Discovery
What: map a logical name (order-service) to current healthy instances.
How: integrate with Consul, etcd, Kubernetes service endpoints, AWS Cloud Map.
Why at gateway: services scale up and down constantly; the gateway must always have a current map.
A gateway can rewrite traffic in flight. Headers: strip internal-only headers, inject correlation IDs, add the verified X-User-Id after auth. Body: rename fields, change formats (e.g. legacy clients sending XML to a JSON backend), version-shim old API shapes onto new ones. Status codes: hide implementation-specific errors behind public-friendly ones. Transformation is powerful and dangerous in equal measure — the gateway is now part of your API contract, and changes there can break clients silently. Use it sparingly and version it.
The pipeline order — TLS → AuthN → Rate Limit → Transform → Route — is not arbitrary. Each stage protects the stages below it. Reorder them at your peril. Almost every “the gateway exhausted Redis under attack” story I've heard ends up being rate-limit before authentication.
- The pipeline: TLS → AuthN → Rate Limit → Transform → Route, with logging/metrics/circuit-breaking across all stages.
- Routing can dispatch on path, method, headers, geography, percentage — keep rules declarative and reviewable.
- Auth, rate limiting, service discovery, transformation are the four cross-cutting concerns the gateway centralises.
- Order matters — auth before rate limit, rate limit before route.
When to Use — and When Not To
An API gateway pays for itself in three situations: when you have multiple backend services and don't want every one of them to re-implement auth/rate-limiting/logging; when you have multiple kinds of clients (web, mobile, partner APIs) needing different shapes of the same data; and when you want to enforce consistent policy at the edge before traffic reaches your business logic. In all three, the gateway centralises something that would otherwise be scattered across N services in N inconsistent ways. The cost — an extra hop, an additional component to operate — is real but usually justified by the consolidation it enables.
USE an API Gateway When…
Multiple backend services share cross-cutting concerns (auth, rate limits, observability).
Public APIs need a stable contract independent of internal service refactoring.
Multiple client types — mobile, web, partner — need tailored response shapes (BFF pattern).
API products with metering, paid tiers, developer portals, key management.
Compliance / governance requires central policy enforcement and audit logs.
DO NOT Use a Gateway When…
Single small service with one client — nginx + rate-limit module is enough.
Internal service-to-service traffic at scale — that's a service mesh job, not a gateway job.
Latency budget is < 5 ms end-to-end — every gateway adds at least 1–3 ms.
You'd use it as a place to stuff business logic — that's how you build a hidden monolith.
Backend-for-Frontend (BFF) is a specific gateway pattern worth knowing by name. Instead of one gateway serving every client identically, you run a gateway (or thin service) per client type: one for the web app, one for iOS, one for Android, one for partners. Each BFF aggregates and shapes the data the way its client wants — mobile gets compact responses, web gets richer payloads with embedded resources, partners get a stable versioned API. Each BFF talks to the same downstream microservices but exposes a client-optimised facade. The downside is duplication; the upside is that no single gateway has to make “one shape that works for everyone,” which usually means “a shape that's wrong for everyone.”
The biggest anti-pattern in this whole space: stuffing application logic into the gateway. Custom Lua scripts that call databases, JS plugins that aggregate from five services, response transformations that implement business rules — once you've done that, deployments to your “edge” require coordinated releases of business logic, debugging spans gateway code and service code, and onboarding any new engineer means teaching them where the gateway-vs-service line is — because there isn't one any more. The gateway should do generic things: route, authenticate, rate-limit, transform shape (not semantics), observe. The moment it makes a domain-specific decision, that decision belongs in a service.
The BFF pattern is usually the right answer when you have multiple client types with diverging needs. Building one gateway that tries to please web, mobile, and partner consumers simultaneously almost always produces a gateway that pleases none of them and is impossible to evolve.
- Use a gateway when multiple services share cross-cutting concerns or multiple client types need tailored APIs.
- Skip it for single small services, internal traffic (use a mesh), or microsecond latency budgets.
- BFF pattern: one gateway per client type — mobile, web, partner — rather than one-size-fits-all.
- Never put business logic in the gateway. The gateway routes; the service decides.
Trade-offs & Comparisons
These two are often conflated and they shouldn't be. The API gateway sits at the north-south boundary — traffic entering or leaving your system. It is concerned with the contract you expose to the outside world. The service mesh handles east-west traffic — service-to-service communication inside your cluster. It cares about mTLS between services, retry budgets, fine-grained traffic shifting for deployments, and observability of internal calls. They solve different problems and you can run both. A common pattern: API gateway at the cluster ingress, service mesh sidecars on every internal service. The gateway authenticates the user; the mesh ensures every internal hop is mTLS-encrypted and observable.
API Gateway — North/South
Concern: traffic between external clients and your services.
Strengths: external auth (JWT, OAuth), rate limiting per API key, public API contracts, monetisation.
Examples: Kong, AWS API Gateway, Apigee, Tyk, Zuul.
Service Mesh — East/West
Concern: service-to-service traffic inside the cluster.
Strengths: mTLS, retries with budgets, traffic shifting, internal observability.
Examples: Istio, Linkerd, Consul Connect, Cilium.
Kong
Model: open-source, plugin-driven, runs anywhere.
Strengths: rich plugin ecosystem (Lua), declarative config, OSS & enterprise tiers.
Trade-off: you operate it yourself; sizing the data plane and Postgres/Cassandra control plane is your job.
AWS API Gateway
Model: fully managed; pay-per-request.
Strengths: zero ops, deep integration with Lambda/IAM/CloudFront, two flavours (REST API, HTTP API).
Trade-off: vendor lock-in, per-request cost adds up at high volume, less plugin flexibility than Kong.
Nginx (and Friends)
Model: general-purpose reverse proxy with API gateway extensions (Nginx Plus, OpenResty).
Strengths: battle-tested, fast, low memory; great when you mostly need a smart reverse proxy.
Trade-off: less out-of-the-box for things like dev portals and key management.
Token Bucket
How: bucket holds N tokens, refilled at rate R; each request consumes a token; reject if empty.
Pros: allows bursts up to bucket size; smooth on average; intuitive to reason about.
Use: the modern default. AWS, Stripe, GitHub all use variants.
Leaky Bucket
How: requests enter a queue; processed at constant rate; overflow is dropped.
Pros: guarantees a smooth output rate (no bursts pass through).
Use: when downstream genuinely cannot handle bursts — e.g. legacy backend with a hard concurrency cap.
Fixed Window Counter
How: count requests per (user, minute); reject when count exceeds limit; reset every minute.
Cons: “double burst” problem — user can use full quota at second 59 and again at second 61.
Use: simple, low-stakes quotas. Avoid for tight rate enforcement.
Sliding Window
How: approximate the rate over a sliding interval using weighted current + previous window counts.
Pros: avoids the double-burst issue with minimal extra cost.
Use: stricter rate enforcement for paid tiers.
Token bucket is the right default 90% of the time. It allows occasional bursts that legitimate users actually need, smooths out sustained over-use, and is trivial to implement on top of Redis with an INCR + EXPIRE pattern. Reach for sliding window when fairness across the boundary moment matters; reach for leaky bucket only when the downstream truly cannot tolerate bursts.
- Gateway = north/south (external traffic), service mesh = east/west (internal traffic). They are complementary.
- Kong for plugin-rich self-hosted, AWS API Gateway for zero-ops managed, Nginx for a smart reverse proxy.
- Token bucket is the default rate-limit algorithm; sliding window for fairness; leaky bucket for fragile backends; fixed window only for low-stakes.
- Pick the gateway whose strengths match your operating model, not the one with the longest feature list.
Production Patterns & Common Mistakes
If you take only two patterns into production from this chapter, take these. Circuit breaker at the gateway: when a downstream service starts erroring or timing out at a high rate, stop forwarding traffic to it for a window. The breaker protects the downstream from being pummelled while it recovers, and it returns fast failures to clients instead of having every request stall on a 30-second timeout. Open / half-open / closed states are standard; the tunable parts are the error threshold, the open duration, and the half-open probe count. Highly available gateway pair: the gateway is by definition a single point that all traffic goes through, so it must not be a single point of failure. Run two or more behind an L4 load balancer or via DNS, in different availability zones, with a tested failover. The cloud-managed options (AWS API Gateway, Cloudflare) handle this for you; the self-hosted ones won't.
Pattern: Circuit Breaker
Goal: stop hammering a failing downstream; fail fast for clients.
States: closed (normal) → open (blocking) on threshold breach → half-open (probing) after timeout → closed when probes succeed.
Tunables: error rate threshold (e.g. 50% of last 100 requests), open window (30 s), probe count (5 requests).
Watch: emit metrics on breaker state; alert on prolonged open.
Pattern: HA Gateway Pair
Goal: the gateway must not become the SPOF that the rest of your architecture removed.
Implementation: two+ instances in separate AZs, behind an L4 LB or anycast IP; tested failover drills.
Watch: shared state (rate-limit counters, sessions) must live in a HA store like Redis Cluster — not in-memory on one gateway.
Mistake 1 — Single Gateway Instance
One gateway in front of everything. When it dies, the entire system is unreachable, even though every backend is healthy. Fix: two+ instances across AZs, behind an L4 LB or anycast.
Mistake 2 — Business Logic in Gateway
Custom Lua/JS plugins that aggregate, validate domain rules, or call databases. Gateway becomes a hidden monolith with no tests. Fix: push logic into a service; gateway does only generic routing/auth/rate-limit.
Mistake 3 — No Circuit Breaker
A failing downstream takes the gateway down with it — thread pools fill with stalled connections. Fix: circuit breaker per upstream + connection pool limits + timeouts.
Mistake 4 — Sync Gateway for Async Work
Long-running job request comes through the gateway; gateway holds the connection for minutes; backend “processes” while the gateway timeout fires. Fix: 202 Accepted + job ID + polling endpoint, or WebSockets, or push to a queue and respond immediately.
Mistake 5 — Missing/Broken Auth Header Hygiene
Gateway forwards client-supplied X-User-Id straight to backends without verifying. Caller can impersonate anyone. Fix: strip all auth-related headers from incoming requests, then inject verified identity headers after auth succeeds.
Bonus — Rate Limit Before Auth
Pipeline ordered as Rate Limit → Auth. Anonymous flood exhausts your rate-limit Redis budget before any auth check happens. Fix: auth first, rate-limit on the verified key. Apply a separate per-IP coarse limit at the very edge for anonymous traffic.
The gateway is the most leveraged piece of code in your edge. Get it right and it's invisible — routes traffic, enforces policy, fails over silently. Get it wrong and you have a single point of failure with business logic in it that nobody can refactor without coordinated downtime. Treat it like infrastructure, not like an application.
- Circuit breakers protect downstreams and fail fast for clients — closed/open/half-open states with tuned thresholds.
- HA gateway pair across AZs with shared state in a HA store — the gateway must not become the SPOF.
- The five outage mistakes: single instance, business logic in gateway, no circuit breaker, sync gateway for async work, header hygiene.
- Treat the gateway like infrastructure — declarative config, version control, generic concerns only.