Architecture Styles
Monoliths, microservices, serverless, and when each one wins.
Monolith โ Not a Dirty Word
Somewhere in the last decade, "monolith" became a pejorative. That is wrong. A monolith is a single deployable unit โ one binary, one deployment pipeline, one database. For most teams at most stages, it is the correct starting architecture. You can build a billion-dollar business on a monolith (Shopify, Stack Overflow, Basecamp all did). The monolith fails not because of its architecture, but because of undisciplined internal structure. A modular monolith is architecturally sound โ and avoids the distributed systems tax entirely.
Traditional Monolith
- All code in one project
- Shared database, shared state
- Fast to develop initially
- Risk: becomes a "big ball of mud" without discipline
Modular Monolith
- Strict module boundaries within one deployable
- Modules communicate via defined interfaces
- Each module owns its data (schema separation)
- Best of both: simplicity + structure
Hexagonal (Ports & Adapters)
- Business logic at center, infrastructure at edges
- Testable without DB, without HTTP
- Swap adapters: PostgreSQL โ DynamoDB without touching core
- Highest quality but highest initial investment
Start with a monolith. Seriously. If you have fewer than 50 engineers, if your domain boundaries are still evolving, if you are not yet sure what "independently scalable" means for your product โ a well-structured monolith will outperform a poorly-structured microservices system every time. Extract services later when you have clear, stable domain boundaries.
Extract a service when you observe these specific conditions โ not when you feel the monolith is "too big":
Valid Extraction Signals
- Deployment bottleneck: one team's changes are blocked because another team's untested code is in the same deployment. Independent deployment is the primary reason to extract.
- Measured scale mismatch: one component needs 10x more resources than the rest. A video transcoding module consuming all CPU while others sit idle. Measured โ not theoretical.
- Technology mismatch: an ML component needs Python + GPU access. The rest is Go. Forcing both into one deployable is the only reason to break here.
- Ownership clarity: a team wants to own and deploy a capability completely independently, including its data. Conway's Law forcing function.
Invalid Extraction Reasons
- "The service would be smaller" โ smaller is not inherently better
- "The module is complex" โ complexity does not justify distribution
- "We want to use a different database for an experiment"
- "We read an article about microservices"
- "It feels like it should be a separate service" โ feelings are not architecture
โ ๏ธ The Distributed Monolith Anti-Pattern
The worst outcome of premature service extraction: services that are separately deployed but tightly coupled at the data layer. Service A cannot deploy without coordinating with Service B because they share a database table. Service B cannot change its API without breaking Service C. You have all the operational complexity of microservices with none of the independence benefits. A well-structured modular monolith beats a distributed monolith in every measurable way.
๐งฉ Modular Monolith Enforcement Mechanisms
Module boundaries in a monolith are only as strong as the enforcement mechanism. Without enforcement, modules collapse into spaghetti as developers take shortcuts.
- Build-time enforcement: ArchUnit (Java), Go package visibility, or similar tools that fail the build if cross-module dependencies are introduced without explicit declaration.
- Schema separation: each module owns tables under a named schema (
orders.*,payments.*,users.*). No cross-schema joins allowed. - API-only cross-module access: cross-module data access goes through the module's public API (in-process function call), not by joining tables. This makes eventual service extraction trivial โ the data is already separated.
- Code review policies: cross-module changes require module owner approval.
- Monolith = single deployable unit. Not inherently bad โ discipline determines quality.
- Modular monolith: strict module boundaries, defined interfaces, schema separation. Best starting point.
- Advantages: one deployment, easy debugging, no network calls, no distributed tracing needed.
- Extract when: deployment bottleneck, measured scale mismatch, tech mismatch, or clear team ownership โ not theoretical cleanliness.
- Distributed monolith: the worst outcome of premature extraction. All distribution cost with none of the independence benefit.
- Module enforcement: ArchUnit, package visibility, schema separation per module, cross-module API-only access.
- Default recommendation: start monolith, extract services when pain is concrete, not theoretical.
Microservices โ The Full Picture
Microservices are not a universal improvement over monoliths โ they are a trade-off. You gain independent deployment, independent scaling, and technology freedom. You pay with network complexity, distributed debugging, data consistency challenges, and operational overhead. Teams that adopt microservices without understanding the cost end up with a "distributed monolith" โ all the complexity of distribution with none of the benefits. The architecture only works when service boundaries align with team boundaries (Conway's Law).
- Large org (50+ engineers): independent team velocity
- Different scaling needs: payment (CPU) vs media (I/O)
- Different tech stacks: ML team (Python) vs API team (Go)
- Regulatory isolation: PCI service separate from rest
- Clear, stable domain boundaries already understood
- Small team (<10): coordination overhead exceeds benefit
- Unclear domain boundaries: you'll split wrong and re-merge
- Team doesn't own their infra: shared ops becomes bottleneck
- Strong consistency needed across services: saga complexity
- No monitoring/tracing: debugging becomes impossible
You do not rewrite a monolith overnight. The strangler fig pattern extracts services incrementally: route new features to a new service, migrate existing features one by one, eventually the monolith has nothing left. Each extraction is reversible โ if the new service fails, route traffic back. Netflix, Amazon, and Uber all migrated this way over years, not months.
Service size is not defined by lines of code. It is defined by deployment independence and team ownership. A service is the right size when: one team can fully own it end-to-end, it can be deployed without coordinating with other teams, and its API surface is stable enough that consumers can rely on it.
Too Small
- A service for each database table
- A service per HTTP endpoint
- Services that always deploy together because they are functionally coupled
- Result: chatty inter-service communication, distributed transaction nightmares
Right Size
- A bounded context (DDD sense)
- All code for one business capability: orders, payments, identity
- One team, one service, independent deployment
- Stable API surface consumers can rely on
Too Large
- Two teams coordinate changes to it
- Contains multiple independent scaling requirements
- Just a monolith with a network boundary drawn around it
- Result: same coordination problems as monolith, plus network overhead
Sharing a database between services destroys independence at the data layer. If Service A and Service B share a table, Service B's schema migration can break Service A. Service A's query load can degrade Service B's performance. Neither team can evolve their data model without coordinating with the other.
Database per service means: Service A's data cannot be accessed by Service B through SQL. Service B must call Service A's API to get data it needs. This forces explicit contracts and prevents tight coupling.
The implementation challenge: cross-service queries are now API calls. Reports that used to be simple JOIN queries must be replaced with data denormalization, event-driven projections, or a separate read model (CQRS). This is real engineering work โ acknowledge it upfront rather than discovering it mid-migration.
๐ Internal API Versioning
Service APIs are contracts. Internal service APIs need the same versioning discipline as public APIs โ the consumer is inside your organization, not outside, but breaking them has the same consequences.
Practical approach: never remove or change an existing API field โ only add new optional fields. When a breaking change is unavoidable, run old and new versions in parallel during a migration window, confirm all consumers have migrated, then decommission the old version. Contract testing (Pact) catches breaking changes in CI before production. A service that breaks its consumers on every deploy destroys team independence โ the opposite of what microservices are supposed to achieve.
Microservices are an organizational scaling strategy, not a technical improvement. They solve the problem of 200 engineers needing to deploy independently. They do not make a 5-person team faster. If you cannot articulate which specific problem microservices solve for YOUR team, you do not need them yet.
- Microservices: independently deployable, own DB, own team. Tech freedom + independent scaling.
- Tax: network calls, distributed tracing, saga transactions, operational overhead.
- Service size: one team, independent deployment, stable API. Not defined by lines of code.
- Too small: services that always deploy together. Too large: services needing multi-team coordination.
- Database per service: non-negotiable for true independence. Cross-service data access through APIs only, never SQL joins.
- Internal API versioning: same discipline as public APIs. Contract testing (Pact) catches breaking changes in CI.
- Conway's Law: service boundaries must align with team boundaries or you get "distributed monolith."
- Migration: strangler fig pattern โ extract incrementally, never big-bang rewrite.
Serverless
Serverless removes the server from your mental model โ you write a function, the platform handles scaling, availability, and infrastructure. You pay only when your code runs. For event-driven workloads with variable traffic, the operational simplicity is transformative: no capacity planning, no patching, no scaling decisions. For high-throughput, latency-sensitive workloads, the cold start penalty and execution time limits make it a poor fit.
- Zero infrastructure management
- Pay per invocation (idle = $0)
- Auto-scales instantly (0 โ 1000 concurrent)
- Cold start: 100msโ2s on first invocation
- Max execution: 15 min (Lambda)
- Best for: event processing, APIs with variable traffic, glue code
- You manage cluster, scaling policies, networking
- Pay for reserved capacity (idle still costs $)
- Scaling takes secondsโminutes (pod startup)
- No cold start for running containers
- No execution time limit
- Best for: steady traffic, long-running processes, GPU workloads
โ๏ธ Cold Start Mitigation Strategies
Cold start occurs when no warm container exists โ the platform must download the runtime, start the container, load code, then execute. Mitigation strategies:
- Provisioned concurrency: pre-warm a fixed number of instances that are always ready. Eliminates cold starts entirely for that capacity. Costs money even when idle โ use for latency-sensitive endpoints only.
- Runtime selection: Java cold starts (1โ2s) are dramatically slower than Go or Python (100โ300ms). Choose runtimes based on cold start tolerance, not familiarity.
- Package size: smaller deployment packages initialize faster. Remove unused dependencies, use tree-shaking, avoid bundling unneeded SDKs.
- Architecture choice: if cold start latency is unacceptable for your use case, serverless is the wrong tool. Use containers with predictable startup time.
๐ฐ Serverless Cost Model Warning
Serverless is cheaper for bursty, low-frequency workloads. For high-frequency steady-state workloads, it is often more expensive than containers.
Example: a Lambda function running 100ms at 512MB, invoked 10M times/month โ $11/month. A single t3.medium at $0.0416/hr โ $30/month โ but handles many more concurrent requests at that price.
The breakeven depends on traffic pattern. Model costs before choosing serverless for savings โ the assumption that serverless is always cheaper is false for sustained high-traffic workloads.
๐ Vendor Lock-In Reality
Serverless is the highest vendor lock-in architecture style. Your code structure, event sources, configuration, IAM permissions, and deployment tooling are all provider-specific. A Lambda function cannot be moved to GCP Cloud Functions without rewriting the handler, event types, and infrastructure.
Mitigation: keep business logic in provider-agnostic modules and wrap them in thin provider-specific handlers. The handler adapts the provider event format to your domain model โ business logic never imports provider SDKs directly. This allows moving the handler wrapper if you change providers, while preserving core logic.
Serverless is not "no servers" โ it is "not your servers." The operational simplicity is real for event-driven, bursty workloads. But for steady high-throughput services, containers are cheaper and more predictable. The decision: variable traffic + event-driven โ serverless. Steady traffic + long connections โ containers.
- Serverless: function-level deployment, auto-scaling, pay-per-invocation. Zero infra management.
- Cold start mitigation: provisioned concurrency for latency-sensitive endpoints, runtime selection (Go/Python faster than Java), minimize package size.
- Limits: 15 min execution, 10GB memory, stateless. Not for WebSockets or GPU workloads.
- Cost model: cheaper for bursty low-frequency workloads. More expensive than containers for sustained high-traffic. Model before assuming savings.
- Vendor lock-in: highest of any architecture style. Isolate business logic from provider-specific handler wrappers.
- Best for: event-driven (S3 triggers, SQS processing), APIs with variable traffic, scheduled jobs.
Service Mesh
When you have 5 microservices, you can handle cross-cutting concerns (mTLS, retries, circuit breaking, observability) in each service's code. When you have 50, you need that logic extracted into infrastructure. A service mesh provides these capabilities as a transparent network layer โ your application code does not change. The mesh handles encryption, routing, retries, and telemetry via sidecar proxies attached to every service instance.
Istio
- Full-featured: mTLS, traffic management, observability, policy
- Envoy sidecar proxy (high performance, configurable)
- Complex to operate โ significant resource overhead
- Best for: large deployments (100+ services), compliance needs
Linkerd
- Lightweight, simpler than Istio, Rust-based proxy
- Lower resource footprint and operational complexity
- Fewer features but covers 80% of use cases
- Best for: teams wanting mesh benefits without Istio weight
Do Not Use When
- Fewer than 10 services: operational overhead of running Istio/Linkerd exceeds the value. Use a shared library for mTLS, retries, and circuit breaking.
- Single-language environment: a shared library implementing the same capabilities is simpler, cheaper, and easier to debug. Mesh's polyglot value only matters in multi-language systems.
- No Kubernetes expertise: meshes require deep K8s knowledge. Deploying a mesh on poor K8s foundations creates compounding operational risk.
- Latency-sensitive hot paths: sidecar adds ~1ms per hop. In a 5-service chain = 5ms added โ potentially 25% of your latency budget. Profile first.
Justified When
- 50+ services across multiple languages
- Compliance requirement for encryption everywhere (mTLS)
- Advanced traffic management (canary, blue/green, fault injection)
- Need unified observability across all service communication
- Multiple teams cannot coordinate on a shared library upgrade
๐ฎ eBPF-Based Service Meshes
Next-generation meshes (Cilium, Istio's ambient mode) are moving from sidecar proxies to eBPF-based implementations that run in the kernel rather than per-pod userspace proxies. This eliminates per-pod memory overhead and reduces proxy latency. Cilium is becoming the dominant choice for new deployments where Kubernetes networking and mesh capabilities are needed together. The sidecar model is not obsolete, but eBPF alternatives are worth evaluating for new infrastructure.
A service mesh is justified when the number of services makes per-service implementation of mTLS, retries, and observability unsustainable. At 5 services, a shared library suffices. At 50+ services across multiple languages, a mesh pays for itself. Below that threshold, it is over-engineering.
- Service mesh: infrastructure layer for service-to-service communication. Transparent to app code.
- Sidecar pattern: proxy per pod (Envoy/Linkerd) handles mTLS, retries, circuit breaking, telemetry.
- Control plane: distributes config, rotates certs, enforces policy (Istiod).
- When needed: 50+ services, multi-language, compliance/mTLS requirement, advanced traffic management.
- Do not use: fewer than 10 services, single-language, teams without K8s expertise, or latency-critical paths where 1ms per hop matters.
- eBPF alternatives: Cilium, Istio ambient mode eliminate per-pod sidecar overhead. Evaluate for new infrastructure.
- Cost: latency overhead (~1ms per hop), memory per sidecar, operational complexity.
Edge Computing
Traditional architecture puts all compute in a centralized cloud region. Every user request travels to that region and back โ adding latency proportional to geographic distance. Edge computing moves logic closer to users: CDN edge functions run at Points of Presence worldwide, reducing latency from 200ms to 20ms for many operations. For IoT, edge means processing at the device or local gateway โ avoiding the round trip entirely. The trade-off: limited compute, stateless by design, and deployment complexity.
CDN Edge Functions
- Cloudflare Workers: V8 isolates, <5ms cold start
- Lambda@Edge / CloudFront Functions
- Fastly Compute@Edge (Wasm-based)
- Use for: auth, routing, A/B tests, personalization
- Limit: 10โ50ms CPU time, limited APIs
IoT Edge Computing
- Process data at device or local gateway
- Filter, aggregate before sending to cloud
- AWS IoT Greengrass, Azure IoT Edge
- Use for: real-time ML inference, privacy, offline operation
- Reduces bandwidth, latency, and cloud costs
CPU Time Limits
- Cloudflare Workers: 50ms CPU time per request (not wall time โ actual CPU execution)
- Enough for auth token validation, A/B routing, simple transforms
- Not enough for image processing, complex business logic, or DB queries
- Lambda@Edge: 5s for viewer request functions, 30s for origin request functions
No Persistent Connections
- Edge functions cannot maintain open connections to databases, queues, or internal services
- Every request must be self-contained or use connections established within the request lifetime
- Use external edge APIs (Cloudflare KV, Durable Objects) for state that must persist at the edge
Consistency at the Edge
- Cloudflare KV is eventually consistent โ writes propagate globally within 60 seconds
- For strongly consistent data (session tokens, feature flags): accept eventual consistency or centralize reads to origin
- Edge functions making decisions on stale data produce incorrect behavior that is geographically distributed and hard to debug
Debugging Complexity
- Edge functions deploy globally immediately โ no regional canary
- A bug is deployed everywhere at once
- Test extensively in staging, use gradual rollouts via traffic splitting if supported
- Failures are geographically distributed โ harder to reproduce and diagnose
- Serves pre-computed static responses from edge cache
- No code runs โ the CDN returns a cached HTTP response
- Works for assets that do not change per-user (CSS, JS, images)
- Cannot personalize, validate auth, or make dynamic routing decisions
- Runs code at the edge for each request
- Response is computed, not cached
- Enables per-user personalization, auth validation, dynamic routing
- The two complement each other: edge function checks auth โ serves from cache if authorized + cached
Edge computing reduces latency by eliminating geography. If your logic can run in <50ms of CPU, is stateless, and benefits from being close to users โ run it at the edge. Auth checks, A/B testing, geo-routing, and cache personalization are natural edge workloads. Business logic that needs your database still belongs at the origin.
- Edge computing: move compute closer to users. CDN PoPs (300+ worldwide) run your code.
- CDN edge functions: Cloudflare Workers, Lambda@Edge. Sub-5ms cold start. Stateless operations.
- CPU limits: Cloudflare Workers 50ms, Lambda@Edge 5โ30 seconds. No persistent connections โ each request must be self-contained.
- Consistency: Cloudflare KV eventually consistent, 60-second propagation. Centralize strongly consistent reads to origin.
- Debugging: edge functions deploy globally immediately. Test extensively โ failures are geographically distributed.
- CDN caching vs edge compute: caching serves cached responses. Edge runs code per-request. They complement each other.
- IoT edge: process at device/gateway. Filter before cloud. Offline operation, privacy, bandwidth saving.
- Best for: auth validation, A/B testing, geo-routing, cache personalization, geolocation.
Start Here. Seriously.
- Modular monolith: strict boundaries without distribution cost
- Extract when: deployment bottleneck, scale mismatch, tech mismatch, team ownership
- Distributed monolith: worst outcome of premature extraction
- Enforce: ArchUnit, schema separation, API-only cross-module access
Organizational Scaling Strategy
- Service size: one team, independent deploy, stable API. Not LOC
- DB per service: non-negotiable. Cross-service data via API only
- Internal API versioning: contract testing (Pact) catches breaks in CI
- Strangler fig: migrate incrementally, never big-bang rewrite
Functions, Not Servers
- Cold start: provisioned concurrency, fast runtimes, small packages
- Cost: cheaper for bursty workloads, more expensive for steady high-traffic
- Vendor lock-in: highest of any style. Isolate business logic from handlers
- Best: event-driven, variable traffic. Not for: steady high-QPS, GPU
Infrastructure for Service Communication
- Sidecar: mTLS, retries, circuit breaking, telemetry. Transparent to app
- Do not use: <10 services, single language, no K8s expertise
- eBPF (Cilium, ambient mode): eliminates sidecar overhead. Evaluate for new infra
- Cost: ~1ms per hop, memory per sidecar, operational complexity
Eliminate Geography
- CPU limits: Workers 50ms, Lambda@Edge 5-30s. No persistent connections
- KV eventually consistent (60s). Centralize strongly consistent reads to origin
- CDN caching = cached responses. Edge compute = code per request. Complementary
- Deploy globally immediately โ test extensively, failures are distributed