System Design · Architecture Styles

Architecture Styles

Monoliths, microservices, serverless, and when each one wins.

Chapter One

Monolith — Not a Dirty Word

When the Simplest Architecture Is the Right One

Somewhere in the last decade, "monolith" became a pejorative. That is wrong. A monolith is a single deployable unit — one binary, one deployment pipeline, one database. For most teams at most stages, it is the correct starting architecture. You can build a billion-dollar business on a monolith (Shopify, Stack Overflow, Basecamp all did). The monolith fails not because of its architecture, but because of undisciplined internal structure. A modular monolith is architecturally sound — and avoids the distributed systems tax entirely.

Monolith Architecture — Layered Structure

📦

Traditional Monolith

All code in one project
Shared database, shared state
Fast to develop initially
Risk: becomes a "big ball of mud" without discipline

🧩

Modular Monolith

Strict module boundaries within one deployable
Modules communicate via defined interfaces
Each module owns its data (schema separation)
Best of both: simplicity + structure

🎯

Hexagonal (Ports & Adapters)

Business logic at center, infrastructure at edges
Testable without DB, without HTTP
Swap adapters: PostgreSQL → DynamoDB without touching core
Highest quality but highest initial investment

Start with a monolith. Seriously. If you have fewer than 50 engineers, if your domain boundaries are still evolving, if you are not yet sure what "independently scalable" means for your product — a well-structured monolith will outperform a poorly-structured microservices system every time. Extract services later when you have clear, stable domain boundaries.

When to Extract a Service

Extract a service when you observe these specific conditions — not when you feel the monolith is "too big":

✅

Valid Extraction Signals

Deployment bottleneck: one team's changes are blocked because another team's untested code is in the same deployment. Independent deployment is the primary reason to extract.
Measured scale mismatch: one component needs 10x more resources than the rest. A video transcoding module consuming all CPU while others sit idle. Measured — not theoretical.
Technology mismatch: an ML component needs Python + GPU access. The rest is Go. Forcing both into one deployable is the only reason to break here.
Ownership clarity: a team wants to own and deploy a capability completely independently, including its data. Conway's Law forcing function.

🚫

Invalid Extraction Reasons

"The service would be smaller" — smaller is not inherently better
"The module is complex" — complexity does not justify distribution
"We want to use a different database for an experiment"
"We read an article about microservices"
"It feels like it should be a separate service" — feelings are not architecture

⚠️ The Distributed Monolith Anti-Pattern

The worst outcome of premature service extraction: services that are separately deployed but tightly coupled at the data layer. Service A cannot deploy without coordinating with Service B because they share a database table. Service B cannot change its API without breaking Service C. You have all the operational complexity of microservices with none of the independence benefits. A well-structured modular monolith beats a distributed monolith in every measurable way.

🧩 Modular Monolith Enforcement Mechanisms

Module boundaries in a monolith are only as strong as the enforcement mechanism. Without enforcement, modules collapse into spaghetti as developers take shortcuts.

Build-time enforcement: ArchUnit (Java), Go package visibility, or similar tools that fail the build if cross-module dependencies are introduced without explicit declaration.
Schema separation: each module owns tables under a named schema (orders.*, payments.*, users.*). No cross-schema joins allowed.
API-only cross-module access: cross-module data access goes through the module's public API (in-process function call), not by joining tables. This makes eventual service extraction trivial — the data is already separated.
Code review policies: cross-module changes require module owner approval.

📋 Chapter 1 — Summary

Monolith = single deployable unit. Not inherently bad — discipline determines quality.
Modular monolith: strict module boundaries, defined interfaces, schema separation. Best starting point.
Advantages: one deployment, easy debugging, no network calls, no distributed tracing needed.
Extract when: deployment bottleneck, measured scale mismatch, tech mismatch, or clear team ownership — not theoretical cleanliness.
Distributed monolith: the worst outcome of premature extraction. All distribution cost with none of the independence benefit.
Module enforcement: ArchUnit, package visibility, schema separation per module, cross-module API-only access.
Default recommendation: start monolith, extract services when pain is concrete, not theoretical.

Chapter Two

Microservices — The Full Picture

The Benefits Are Real, But So Are the Costs

Microservices are not a universal improvement over monoliths — they are a trade-off. You gain independent deployment, independent scaling, and technology freedom. You pay with network complexity, distributed debugging, data consistency challenges, and operational overhead. Teams that adopt microservices without understanding the cost end up with a "distributed monolith" — all the complexity of distribution with none of the benefits. The architecture only works when service boundaries align with team boundaries (Conway's Law).

Microservices Architecture — Complete Picture

When Microservices Win

When Microservices Hurt

Large org (50+ engineers): independent team velocity
Different scaling needs: payment (CPU) vs media (I/O)
Different tech stacks: ML team (Python) vs API team (Go)
Regulatory isolation: PCI service separate from rest
Clear, stable domain boundaries already understood

Small team (<10): coordination overhead exceeds benefit
Unclear domain boundaries: you'll split wrong and re-merge
Team doesn't own their infra: shared ops becomes bottleneck
Strong consistency needed across services: saga complexity
No monitoring/tracing: debugging becomes impossible

Migration: Strangler Fig Pattern

You do not rewrite a monolith overnight. The strangler fig pattern extracts services incrementally: route new features to a new service, migrate existing features one by one, eventually the monolith has nothing left. Each extraction is reversible — if the new service fails, route traffic back. Netflix, Amazon, and Uber all migrated this way over years, not months.

Service Size Guidance

Service size is not defined by lines of code. It is defined by deployment independence and team ownership. A service is the right size when: one team can fully own it end-to-end, it can be deployed without coordinating with other teams, and its API surface is stable enough that consumers can rely on it.

⬇️

Too Small

A service for each database table
A service per HTTP endpoint
Services that always deploy together because they are functionally coupled
Result: chatty inter-service communication, distributed transaction nightmares

✅

Right Size

A bounded context (DDD sense)
All code for one business capability: orders, payments, identity
One team, one service, independent deployment
Stable API surface consumers can rely on

⬆️

Too Large

Two teams coordinate changes to it
Contains multiple independent scaling requirements
Just a monolith with a network boundary drawn around it
Result: same coordination problems as monolith, plus network overhead

Database Per Service — Why It Is Non-Negotiable

Sharing a database between services destroys independence at the data layer. If Service A and Service B share a table, Service B's schema migration can break Service A. Service A's query load can degrade Service B's performance. Neither team can evolve their data model without coordinating with the other.

Database per service means: Service A's data cannot be accessed by Service B through SQL. Service B must call Service A's API to get data it needs. This forces explicit contracts and prevents tight coupling.

The implementation challenge: cross-service queries are now API calls. Reports that used to be simple JOIN queries must be replaced with data denormalization, event-driven projections, or a separate read model (CQRS). This is real engineering work — acknowledge it upfront rather than discovering it mid-migration.

📜 Internal API Versioning

Service APIs are contracts. Internal service APIs need the same versioning discipline as public APIs — the consumer is inside your organization, not outside, but breaking them has the same consequences.

Practical approach: never remove or change an existing API field — only add new optional fields. When a breaking change is unavoidable, run old and new versions in parallel during a migration window, confirm all consumers have migrated, then decommission the old version. Contract testing (Pact) catches breaking changes in CI before production. A service that breaks its consumers on every deploy destroys team independence — the opposite of what microservices are supposed to achieve.

Microservices are an organizational scaling strategy, not a technical improvement. They solve the problem of 200 engineers needing to deploy independently. They do not make a 5-person team faster. If you cannot articulate which specific problem microservices solve for YOUR team, you do not need them yet.

📋 Chapter 2 — Summary

Microservices: independently deployable, own DB, own team. Tech freedom + independent scaling.
Tax: network calls, distributed tracing, saga transactions, operational overhead.
Service size: one team, independent deployment, stable API. Not defined by lines of code.
Too small: services that always deploy together. Too large: services needing multi-team coordination.
Database per service: non-negotiable for true independence. Cross-service data access through APIs only, never SQL joins.
Internal API versioning: same discipline as public APIs. Contract testing (Pact) catches breaking changes in CI.
Conway's Law: service boundaries must align with team boundaries or you get "distributed monolith."
Migration: strangler fig pattern — extract incrementally, never big-bang rewrite.

Chapter Three

Serverless

Functions as the Unit of Deployment

Serverless removes the server from your mental model — you write a function, the platform handles scaling, availability, and infrastructure. You pay only when your code runs. For event-driven workloads with variable traffic, the operational simplicity is transformative: no capacity planning, no patching, no scaling decisions. For high-throughput, latency-sensitive workloads, the cold start penalty and execution time limits make it a poor fit.

Serverless Event-Driven Architecture

Serverless (Functions)

Containers (ECS/K8s)

Zero infrastructure management
Pay per invocation (idle = $0)
Auto-scales instantly (0 → 1000 concurrent)
Cold start: 100ms–2s on first invocation
Max execution: 15 min (Lambda)
Best for: event processing, APIs with variable traffic, glue code

You manage cluster, scaling policies, networking
Pay for reserved capacity (idle still costs $)
Scaling takes seconds–minutes (pod startup)
No cold start for running containers
No execution time limit
Best for: steady traffic, long-running processes, GPU workloads

❄️ Cold Start Mitigation Strategies

Cold start occurs when no warm container exists — the platform must download the runtime, start the container, load code, then execute. Mitigation strategies:

Provisioned concurrency: pre-warm a fixed number of instances that are always ready. Eliminates cold starts entirely for that capacity. Costs money even when idle — use for latency-sensitive endpoints only.
Runtime selection: Java cold starts (1–2s) are dramatically slower than Go or Python (100–300ms). Choose runtimes based on cold start tolerance, not familiarity.
Package size: smaller deployment packages initialize faster. Remove unused dependencies, use tree-shaking, avoid bundling unneeded SDKs.
Architecture choice: if cold start latency is unacceptable for your use case, serverless is the wrong tool. Use containers with predictable startup time.

💰 Serverless Cost Model Warning

Serverless is cheaper for bursty, low-frequency workloads. For high-frequency steady-state workloads, it is often more expensive than containers.

Example: a Lambda function running 100ms at 512MB, invoked 10M times/month ≈ $11/month. A single t3.medium at $0.0416/hr ≈ $30/month — but handles many more concurrent requests at that price.

The breakeven depends on traffic pattern. Model costs before choosing serverless for savings — the assumption that serverless is always cheaper is false for sustained high-traffic workloads.

🔒 Vendor Lock-In Reality

Serverless is the highest vendor lock-in architecture style. Your code structure, event sources, configuration, IAM permissions, and deployment tooling are all provider-specific. A Lambda function cannot be moved to GCP Cloud Functions without rewriting the handler, event types, and infrastructure.

Mitigation: keep business logic in provider-agnostic modules and wrap them in thin provider-specific handlers. The handler adapts the provider event format to your domain model — business logic never imports provider SDKs directly. This allows moving the handler wrapper if you change providers, while preserving core logic.

Serverless is not "no servers" — it is "not your servers." The operational simplicity is real for event-driven, bursty workloads. But for steady high-throughput services, containers are cheaper and more predictable. The decision: variable traffic + event-driven → serverless. Steady traffic + long connections → containers.

📋 Chapter 3 — Summary

Serverless: function-level deployment, auto-scaling, pay-per-invocation. Zero infra management.
Cold start mitigation: provisioned concurrency for latency-sensitive endpoints, runtime selection (Go/Python faster than Java), minimize package size.
Limits: 15 min execution, 10GB memory, stateless. Not for WebSockets or GPU workloads.
Cost model: cheaper for bursty low-frequency workloads. More expensive than containers for sustained high-traffic. Model before assuming savings.
Vendor lock-in: highest of any architecture style. Isolate business logic from provider-specific handler wrappers.
Best for: event-driven (S3 triggers, SQS processing), APIs with variable traffic, scheduled jobs.

Chapter Four

Service Mesh

Infrastructure for Microservice Communication

When you have 5 microservices, you can handle cross-cutting concerns (mTLS, retries, circuit breaking, observability) in each service's code. When you have 50, you need that logic extracted into infrastructure. A service mesh provides these capabilities as a transparent network layer — your application code does not change. The mesh handles encryption, routing, retries, and telemetry via sidecar proxies attached to every service instance.

Service Mesh — Sidecar Proxy Architecture

🌐

Istio

Full-featured: mTLS, traffic management, observability, policy
Envoy sidecar proxy (high performance, configurable)
Complex to operate — significant resource overhead
Best for: large deployments (100+ services), compliance needs

🔗

Linkerd

Lightweight, simpler than Istio, Rust-based proxy
Lower resource footprint and operational complexity
Fewer features but covers 80% of use cases
Best for: teams wanting mesh benefits without Istio weight

When Not to Use a Service Mesh

🚫

Do Not Use When

Fewer than 10 services: operational overhead of running Istio/Linkerd exceeds the value. Use a shared library for mTLS, retries, and circuit breaking.
Single-language environment: a shared library implementing the same capabilities is simpler, cheaper, and easier to debug. Mesh's polyglot value only matters in multi-language systems.
No Kubernetes expertise: meshes require deep K8s knowledge. Deploying a mesh on poor K8s foundations creates compounding operational risk.
Latency-sensitive hot paths: sidecar adds ~1ms per hop. In a 5-service chain = 5ms added — potentially 25% of your latency budget. Profile first.

✅

Justified When

50+ services across multiple languages
Compliance requirement for encryption everywhere (mTLS)
Advanced traffic management (canary, blue/green, fault injection)
Need unified observability across all service communication
Multiple teams cannot coordinate on a shared library upgrade

🔮 eBPF-Based Service Meshes

Next-generation meshes (Cilium, Istio's ambient mode) are moving from sidecar proxies to eBPF-based implementations that run in the kernel rather than per-pod userspace proxies. This eliminates per-pod memory overhead and reduces proxy latency. Cilium is becoming the dominant choice for new deployments where Kubernetes networking and mesh capabilities are needed together. The sidecar model is not obsolete, but eBPF alternatives are worth evaluating for new infrastructure.

A service mesh is justified when the number of services makes per-service implementation of mTLS, retries, and observability unsustainable. At 5 services, a shared library suffices. At 50+ services across multiple languages, a mesh pays for itself. Below that threshold, it is over-engineering.

📋 Chapter 4 — Summary

Service mesh: infrastructure layer for service-to-service communication. Transparent to app code.
Sidecar pattern: proxy per pod (Envoy/Linkerd) handles mTLS, retries, circuit breaking, telemetry.
Control plane: distributes config, rotates certs, enforces policy (Istiod).
When needed: 50+ services, multi-language, compliance/mTLS requirement, advanced traffic management.
Do not use: fewer than 10 services, single-language, teams without K8s expertise, or latency-critical paths where 1ms per hop matters.
eBPF alternatives: Cilium, Istio ambient mode eliminate per-pod sidecar overhead. Evaluate for new infrastructure.
Cost: latency overhead (~1ms per hop), memory per sidecar, operational complexity.

Chapter Five

Edge Computing

Computation at the Network Boundary

Traditional architecture puts all compute in a centralized cloud region. Every user request travels to that region and back — adding latency proportional to geographic distance. Edge computing moves logic closer to users: CDN edge functions run at Points of Presence worldwide, reducing latency from 200ms to 20ms for many operations. For IoT, edge means processing at the device or local gateway — avoiding the round trip entirely. The trade-off: limited compute, stateless by design, and deployment complexity.

Edge Function — Request Handled Before Origin

⚡

CDN Edge Functions

Cloudflare Workers: V8 isolates, <5ms cold start
Lambda@Edge / CloudFront Functions
Fastly Compute@Edge (Wasm-based)
Use for: auth, routing, A/B tests, personalization
Limit: 10–50ms CPU time, limited APIs

📡

IoT Edge Computing

Process data at device or local gateway
Filter, aggregate before sending to cloud
AWS IoT Greengrass, Azure IoT Edge
Use for: real-time ML inference, privacy, offline operation
Reduces bandwidth, latency, and cloud costs

Edge Compute Constraints and Design Implications

⏱️

CPU Time Limits

Cloudflare Workers: 50ms CPU time per request (not wall time — actual CPU execution)
Enough for auth token validation, A/B routing, simple transforms
Not enough for image processing, complex business logic, or DB queries
Lambda@Edge: 5s for viewer request functions, 30s for origin request functions

🔗

No Persistent Connections

Edge functions cannot maintain open connections to databases, queues, or internal services
Every request must be self-contained or use connections established within the request lifetime
Use external edge APIs (Cloudflare KV, Durable Objects) for state that must persist at the edge

⚠️

Consistency at the Edge

Cloudflare KV is eventually consistent — writes propagate globally within 60 seconds
For strongly consistent data (session tokens, feature flags): accept eventual consistency or centralize reads to origin
Edge functions making decisions on stale data produce incorrect behavior that is geographically distributed and hard to debug

🐛

Debugging Complexity

Edge functions deploy globally immediately — no regional canary
A bug is deployed everywhere at once
Test extensively in staging, use gradual rollouts via traffic splitting if supported
Failures are geographically distributed — harder to reproduce and diagnose

Edge Computing vs CDN Caching

CDN Caching (Traditional)

Edge Computing

Serves pre-computed static responses from edge cache
No code runs — the CDN returns a cached HTTP response
Works for assets that do not change per-user (CSS, JS, images)
Cannot personalize, validate auth, or make dynamic routing decisions

Runs code at the edge for each request
Response is computed, not cached
Enables per-user personalization, auth validation, dynamic routing
The two complement each other: edge function checks auth → serves from cache if authorized + cached

Edge computing reduces latency by eliminating geography. If your logic can run in <50ms of CPU, is stateless, and benefits from being close to users — run it at the edge. Auth checks, A/B testing, geo-routing, and cache personalization are natural edge workloads. Business logic that needs your database still belongs at the origin.

📋 Chapter 5 — Summary

Edge computing: move compute closer to users. CDN PoPs (300+ worldwide) run your code.
CDN edge functions: Cloudflare Workers, Lambda@Edge. Sub-5ms cold start. Stateless operations.
CPU limits: Cloudflare Workers 50ms, Lambda@Edge 5–30 seconds. No persistent connections — each request must be self-contained.
Consistency: Cloudflare KV eventually consistent, 60-second propagation. Centralize strongly consistent reads to origin.
Debugging: edge functions deploy globally immediately. Test extensively — failures are geographically distributed.
CDN caching vs edge compute: caching serves cached responses. Edge runs code per-request. They complement each other.
IoT edge: process at device/gateway. Filter before cloud. Offline operation, privacy, bandwidth saving.
Best for: auth validation, A/B testing, geo-routing, cache personalization, geolocation.

Architecture Styles at a Glance

01 · Monolith

Start Here. Seriously.

Modular monolith: strict boundaries without distribution cost
Extract when: deployment bottleneck, scale mismatch, tech mismatch, team ownership
Distributed monolith: worst outcome of premature extraction
Enforce: ArchUnit, schema separation, API-only cross-module access

02 · Microservices

Organizational Scaling Strategy

Service size: one team, independent deploy, stable API. Not LOC
DB per service: non-negotiable. Cross-service data via API only
Internal API versioning: contract testing (Pact) catches breaks in CI
Strangler fig: migrate incrementally, never big-bang rewrite

03 · Serverless

Functions, Not Servers

Cold start: provisioned concurrency, fast runtimes, small packages
Cost: cheaper for bursty workloads, more expensive for steady high-traffic
Vendor lock-in: highest of any style. Isolate business logic from handlers
Best: event-driven, variable traffic. Not for: steady high-QPS, GPU

04 · Service Mesh

Infrastructure for Service Communication

Sidecar: mTLS, retries, circuit breaking, telemetry. Transparent to app
Do not use: <10 services, single language, no K8s expertise
eBPF (Cilium, ambient mode): eliminates sidecar overhead. Evaluate for new infra
Cost: ~1ms per hop, memory per sidecar, operational complexity

05 · Edge Computing

Eliminate Geography

CPU limits: Workers 50ms, Lambda@Edge 5-30s. No persistent connections
KV eventually consistent (60s). Centralize strongly consistent reads to origin
CDN caching = cached responses. Edge compute = code per request. Complementary
Deploy globally immediately — test extensively, failures are distributed

← Distributed Systems Case Studies →