High-end hardware & parallelism at the cost of energy
Strategy 1 โ Additional Hardware
1
Adding More Hardware
Scale out by deploying additional servers behind a load balancer to distribute incoming client requests across multiple application server instances.
Horizontal Scaling — Load Balancer distributes requests across server instances
Horizontal Scaling: Client โ Load Balancer โ Web Application Servers (multiple instances) โ distributes incoming traffic to handle increased load without upgrading individual machines.
Strategy 2 โ Reduce Communication Between Components
2
Reduce Inter-Component Communication
Minimize the number of calls between system components to reduce latency overhead. Common solutions include caching and batching.
⚠ Problem — N individual calls create high latency overhead
a. Cache โ Solution
Add a cache layer between the web app and backend services
Reduces repeated calls to downstream microservices
Returns pre-computed responses for common queries
Cache Layer — HIT returns instantly, MISS fetches from backend
b. Batch โ Solution
Group multiple requests into a single call to the backend
Avoids: N individual calls → 1 batched call
Useful for bulk data operations and report generation
Batch Aggregator — N requests bundled into 1 backend call
Strategy 3 โ Reduce or Increase Distribution
3
Distribute via Sharding
Increase distribution by sharding data across multiple storage nodes, enabling parallel fetching and improved throughput.
Sharding by client_id — Product Reviews distributed across parallel shards
Real-world example: Sharding a Product Reviews service by client_id — each shard handles a subset of clients, enabling parallel fetching and improved throughput.
Strategy 4 โ Reduce Flexibility of the System
4
Trade Flexibility for Speed
Flexible systems that make external config/database calls at runtime are slower. Where possible, replace with hardcoded constants to eliminate I/O on the hot path.
❌ Flexible (Slow) — External calls on every transaction
publicbooleanprocessTransaction(String type, double amount, String currency) {
// Reads from JSON config โ external call each timeif (amount < config.getJSONObject("minimumAmount").getDouble(currency)) {
return false;
}
// Database call on every transactiondouble fee = database.getCurrentFee(type, currency);
...
}
Optimizing one/multiple components once is good, but systems constantly evolve
Optimizations may be negated by new changes
Performance load testing must be done often and regularly
Strategy 6 โ Compromise on Energy Efficiency
6
Trade Energy for Performance
High-End Hardware
Performance increase: +50%
Energy consumption increase: +150%
Disable Power Mgmt
CPU runs at maximum speed always
Cost: higher energy consumption
Increase Parallelism
Running on multiple GPUs
Distributing computation across multiple computers
📋 Chapter 1 — Summary
Six strategies work in concert to achieve high performance โ from hardware scaling to code-level optimizations
Additional hardware & load balancing โ scale horizontally to handle more load
Cache & batch to reduce communication overhead
Sharding for increased distribution across nodes
Reduce flexibility โ eliminate I/O on hot path for speed
Regular, continuous load testing โ verify performance under realistic conditions
Compromise energy for parallelism & speed
02
Chapter Two ยท Quality Goals
Design Strategies for Achieving Adaptability & Flexibility
Adaptability and flexibility are achieved by keeping changes local, decoupling components, and making conscious decisions about where the system needs to be flexible.
๐ฏ
Determine Flexibility
Identify where the system needs to adapt โ functionality, technology, or environment
๐
Configuration Files
Externalise settings so changes don't require recompilation
๐
Keep Changes Local
Encapsulate change behind stable interfaces
๐
Decouple Components
Facade & Strategy patterns to isolate third-party & algorithm dependencies
๐งฌ
Polymorphism
Program to interfaces โ swap implementations without touching callers
Pattern: Define a family of algorithms (PaymentStrategy), encapsulate each one (CreditCard, PayPal, BankTransfer), and make them interchangeable via a common interface.
Benefit: Add new payment types (e.g., CryptoPayment) without modifying the PaymentProcessor class โ open/closed principle.
Facade Pattern โ Decoupling from Third-Party Libraries
Problem: Your WebApplication is tightly coupled to a complex Third-Party Storage Library (Credentials, ConfigurationManager, DataStorage). Changing the vendor requires massive refactoring.
Solution โ StorageFacade: Introduce a Facade between your application and the third-party library. The application only knows the Facade's simple interface. The complex library implementation is hidden and swappable.
Changes to Video Catalog propagate to Payment Processor
Currency Conversion Data is a shared dependency
A change to streaming video formats requires testing the entire system
Microservices โ Changes Stay Local
Web App โ API Gateway โ Microservices AโE
Each microservice has its own database
Changes to Service A don't affect Services BโE
Polymorphism โ Flexible by Design
Concept: Allows objects of different classes to be treated as objects of a common superclass. Enables flexibility to perform the same action on different objects with different implementations.
Private (Can Change): cachedTransactions, dbConnection, readBuffer, bufferSize โ implementation details that must remain hidden.
Public (Stable Contract): addTransaction(), getBalance(), connectToDbAsync() โ the interface the consumer depends on.
Use Understandable & Maintainable Code
โ Hard to Understand
publicintcalc(int n) {
int s = 0;
for(int i = 0; i <= n; i++) {
if(i % 2 == 0) s += i;
}
int f = 1, x = 5;
for(int i = 1; i <= x; ++i) { f *= i; }
return s;
}
โ Separate methods with meaningful names
publicintsumOfEvenNumbers(int limit) {
int sum = 0;
for(int i = 0; i <= limit; i++) {
if(isEven(i)) { sum += i; }
}
return sum;
}
publicintfactorial(int number) {
int result = 1;
for(int i = 1; i <= number; i++) { result *= i; }
return result;
}
📋 Chapter 2 — Summary
Adaptability is achieved through deliberate design decisions โ decoupling, hiding details, and isolating flexibility points
Determine where flexibility is needed: functionality, data, third-party, UI, platform
Strategy Pattern & Feature Flags โ swap behaviors at runtime
Facade Pattern โ decouple from complex third-party libraries
Configuration files for environment variability
Keep changes local โ microservices, modular boundaries
Polymorphism for extensible behavior
Information Hiding โ stable public contracts, hide internals
High availability is achieved through three pillars: Error Prevention, Error Detection, and Error Handling โ each playing a distinct role in keeping a system continuously operational.
Validate accuracy of results across redundant components
3 โ Error Handling
Robust exception handling
Rollback mechanisms
Redundant system components
Auto-replace defective components
Error Prevention โ Using Transactions
❌ No Transaction — Partial Failure Risk
-- If the second UPDATE fails, CompanyX loses-- 1000 but Bob never receives it!UPDATE accounts SET balance = balance - 1000.00WHERE name = 'CompanyX';
-- โ SYSTEM CRASH HERE โ UPDATE accounts SET balance = balance + 1000.00WHERE name = 'Bob';
-- โ CompanyX lost 1000, Bob never received it!
✓ With Transaction — Atomic, All-or-Nothing (BEGIN ... COMMIT)
BEGIN;
UPDATE accounts SET balance = balance - 1000.00WHERE name = 'CompanyX';
UPDATE accounts SET balance = balance + 1000.00WHERE name = 'Bob';
COMMIT; -- โ Only if BOTH succeed โ atomic guarantee-- If anything fails between BEGIN and COMMIT,-- the entire transaction is rolled back automatically.-- Neither CompanyX nor Bob's balance is changed.
Error Prevention โ Input Validation
Strictly Define Valid Input
Define what is valid input at every boundary
Define what is invalid and reject it early
Prevents both accidental errors and malicious attacks
Identify bottleneck stages in processing pipelines
A slow stage creates back pressure — upstream stages stall waiting for it to clear
Downstream stages starve — they have nothing to process
Throughput of the entire pipeline is limited by the slowest stage
⚠ Back Pressure — Color Correction bottleneck stalls the entire image pipeline
✓ Solution — Parallelize the bottleneck stage to match upstream throughput
Error Detection โ Monitoring
Critical Metrics to Monitor
Uptime
HTTP req/sec
Error Rate
CPU / Memory
Error Status Codes
Latency P99
Disk I/O
Queue Depth
Critical metrics must be published and aggregated for both visual (dashboards) and programmatic (alerting) monitoring.
Error Detection โ Validating Accuracy of Results
Cross-Validation for Data Consistency
Redundant data sources should produce consistent results when cross-checked
Compare the sum of all account balances against the sum of all wire transfers
If the net total doesn't match the expected value, data corruption or a bug has occurred
Run these checks periodically (scheduled jobs) or after critical operations
Example: A banking system has two tables — accounts (current balances) and wire_transfers (pending transfers). By summing both, the system verifies that money was neither created nor lost. If CompanyX has 10,000 and Bob has 50 in accounts, and there are pending transfers of −300 (CompanyY→CompanyZ) and +250 (Jane→Bob), the net total should be exactly 10,000. Any deviation signals a consistency error.
-- Cross-validate accounts table against wire transfers for consistency-- Expected: money is neither created nor lostSELECT
(SELECTSUM(balance) FROM accounts) +
(SELECTSUM(amount) FROM wire_transfers)
AS net_total_balance;
-- accounts:-- CompanyX = 10,000-- Bob = 50-- Total = 10,050-- wire_transfers (pending):-- CompanyY โ CompanyZ = -300-- Jane โ Bob = +250-- Total = -50-- net_total_balance = 10,050 + (-50) = 10,000 โ consistent-- If result โ 10,000 โ โ data corruption detected!
Error Handling โ Robust Exception Mechanisms
Try-Catch โ Catch & Handle Exceptions
Wrap risky operations in a try-catch block
Catch specific exceptions — don't swallow errors silently
Log the error, return a meaningful response to the caller
Prevents unhandled crashes from bringing down the service
Retry with Backoff
Server catches exception from External Broker Service
Waits and retries — transient failures often self-resolve
Use exponential backoff to avoid thundering herd
✓ Green path: retry succeeds
Fallback to Alternative
If primary broker fails, route to Another Broker Service
Circuit Breaker pattern prevents cascade failures
Consumer is unaware of the fallback — transparent recovery
Error Queue for Future Verification
Add failed trades to an error queue
Queue enables asynchronous retry and audit
Prevents data loss — no trade is silently dropped
Error Handling โ Transaction Rollback
ROLLBACK โ Undo Partial Work on Failure
Wrap related operations in a BEGIN ... COMMIT / ROLLBACK block
If a business rule fails mid-transaction, ROLLBACK undoes all preceding changes
Prevents selling out-of-stock items in a race condition
Atomicity guarantees data integrity — the database is never left in a half-done state
Example: An e-commerce system processes a purchase. It first inserts a sale record, then checks inventory. If inventory is zero, the ROLLBACK undoes the sale insert — the customer never gets charged for an out-of-stock item, and the database stays consistent.
✓ Transaction with conditional ROLLBACK
BEGIN;
-- Step 1: Optimistically insert the saleINSERT INTO sales (product_id, user_id)
VALUES (@product_id, @user_id);
-- Step 2: Lock the inventory row and check countSELECT count FROM inventory
WHERE product_id = @product_id
FOR UPDATE; -- row-level lock prevents race conditionsIF count > 0THEN-- โ In stock โ decrement and commitUPDATE inventory SET count = count - 1WHERE product_id = @product_id;
COMMIT; -- sale + inventory update both persistELSE-- โ Out of stock โ undo the INSERT, nothing changesROLLBACK; -- sale record is removed, DB unchangedEND IF;
Eliminating Single Points of Failure
Redundancy at Every Layer
Identify every component that, if it fails, brings the whole system down
Replace single nodes with clustered / replicated equivalents
Use active-active or active-passive failover depending on RTO/RPO requirements