LearningTree · AWS · Fundamentals

AWS Cloud
Fundamentals

Everything you need before touching a single AWS service — what cloud is, how service models work, virtualization, global infrastructure, shared responsibility, and design principles.

Chapter One

What is Cloud Computing?

Cloud computing is the on-demand delivery of computing resources — servers, storage, databases, networking — over the internet, paid for by usage rather than ownership.

Instead of owning and managing physical hardware, you can access these resources whenever you need them and pay only for what you use. The hardware still exists — it just lives in someone else's data center, abstracted behind APIs.

Background & History

Before cloud computing, organizations relied on on-premise infrastructure — buying, racking, powering, and operating their own servers in their own buildings.

🏢

The Traditional Setup

Purchase physical servers up-front
Run private data centers (cooling, power, security)
Manage networking, storage, OS patches
Plan capacity months — sometimes years — in advance

⚠️

Why It Broke

High up-front capital expense
Long lead times (weeks to months for new servers)
Hard to scale — you over-provision or run out
Most hardware sits idle most of the time

As applications grew and internet usage exploded, this model became inefficient. The cloud emerged as a way to share large pools of hardware across many tenants, billed by the hour (and later, the millisecond).

Problems It Solves

🛠️

Infrastructure Overhead

No physical hardware, cooling, or networking to manage

📈

Scalability

Scale resources up or down in minutes, not months

💰

Capital Cost

No up-front investment — pay only for what you actually use

⚡

Speed

Provision a database or server in seconds, not weeks

Core Concept

Cloud computing provides on-demand access to shared computing resources over the network.

Five characteristics — codified by NIST — define a true cloud:

🖱️

On-Demand Self-Service

Provision resources via API or console — no human in the loop

📊

Scalability & Elasticity

Capacity grows and shrinks with load, automatically

🧾

Pay-as-You-Go

Metered billing — by hour, second, request, or GB

🤝

Resource Pooling

Multi-tenant infrastructure shared securely across customers

🛡️

High Availability

Redundancy built in — failures are absorbed, not fatal

🌐

Broad Network Access

Reachable from anywhere over standard internet protocols

Mental Model — Cloud is Electricity

Think of cloud computing the way you think of electricity. You don't build a power plant in your basement — you plug in.

Electricity Grid	Cloud Computing
You don't build your own power plant	You don't own racks of servers
You consume power on demand	You consume compute & storage on demand
You pay a utility bill based on kWh used	You pay a cloud bill based on usage (CPU-hours, GB, requests)
The grid handles generation, transmission, redundancy	The provider handles hardware, failover, capacity
Outages are rare and absorbed by the grid	Failures are isolated to zones; services remain available

Concept Diagram

Users → Internet → Shared Cloud Infrastructure

How It Works — Step by Step

RequestUser asks for a resource — a server, a bucket, a database — via API or console.

AllocateThe cloud provider carves capacity from its huge shared pool.

ExposeThe resource is reachable over the network with an endpoint & credentials.

UseThe user reads, writes, runs code, serves traffic.

BillUsage is metered and billed per second, per request, or per GB.

Real-World Usage

Every industry runs on cloud today. A non-exhaustive list:

🌐

Web & Mobile Apps

Hosting backends, APIs, static sites

📊

Data & Analytics

Storing & querying terabytes of data

🤖

Machine Learning

Training and serving models on GPUs

🎬

Streaming & CDN

Video, audio, content delivery globally

💼

SaaS Platforms

Multi-tenant business apps (CRM, HR, billing)

🛡️

Backup & DR

Off-site backups, cross-region failover

Connection to AWS

Cloud computing is implemented through providers — and the largest by market share is AWS (Amazon Web Services). Two foundational services map directly to the diagram above:

🖥️

Amazon EC2

Virtual servers — pick OS, CPU, RAM, network
Pay per second of running time
The "compute" pillar of the cloud

💾

Amazon S3

Object storage — durable, virtually unlimited
Pay per GB stored and per request
The "storage" pillar of the cloud

AWS abstracts the underlying hardware so you can focus on building applications instead of operating infrastructure.

Go deeper → Amazon EC2 · Amazon S3

Deployment Models

Model	Owned By	Used By	Typical Fit
Public Cloud	Provider (AWS, Azure, GCP)	Many tenants share	Default for most workloads
Private Cloud	Single organization	One tenant only	Regulated / strict data residency
Hybrid Cloud	Mix of both	Per workload	Lift-and-shift, gradual migration
Multi-Cloud	Multiple providers	Per workload	Avoid lock-in — but more complex

Deep dive → Cloud Models (Deployment & Service)

Why Organizations Adopt the Cloud

🚀

Speed

Faster development & deployment cycles
Idea → production in days, not quarters

🌍

Global Reach

Deploy to 30+ regions around the planet
Latency-aware routing built in

⚙️

Reduced Ops Burden

Provider handles HW, patching, replacement
Engineers focus on product

🛡️

Reliability

Multi-AZ, multi-region high availability
SLAs measured in 9s

💰

Cost Efficiency

OpEx instead of CapEx
Scale-to-zero possible with serverless

🧪

Experimentation

Spin up an experiment for $5 and tear it down
Innovation cost approaches zero

Common Misunderstandings

Myth	Reality
"Cloud means data floats somewhere ethereal."	Data lives in real, physical data centers in specific countries. You can usually pick the region.
"Cloud is always cheaper."	It depends on usage and architecture. Idle reserved capacity or chatty workloads can cost more than on-prem.
"Cloud removes all responsibility."	Wrong — see the Shared Responsibility Model. You still own apps, data, IAM, and configuration.
"Cloud is automatically secure."	The provider secures the infrastructure; you secure what you put in it (mis-configured S3 buckets are the classic failure).
"Cloud is just someone else's computer."	Reductive — you also get global networking, managed services, autoscaling, and a programmable API surface that's not feasible on-prem.

Summary

📋 What is Cloud Computing — Recap

Cloud computing provides on-demand access to computing resources over the network.
It removes the need to own and operate physical infrastructure.
It enables scalability, flexibility, and cost efficiency via pay-per-use billing.
It's the foundation of modern application development — every major SaaS, mobile app, and ML system runs on it.
AWS is the largest implementation; EC2 and S3 are the canonical compute and storage services.
The cloud doesn't remove responsibility — it shifts it (hardware to provider, configuration to you).

👉 Key Takeaway

Cloud computing turns infrastructure into an on-demand utility — just like electricity. You stop owning hardware and start consuming capability.

Chapter Two

Cloud Service Models — IaaS · PaaS · SaaS

Cloud service models define how responsibilities are divided between the cloud provider and the user.
Who manages what in the cloud?

Every AWS service sits inside one of these models. Knowing which model you're working in tells you immediately what you're responsible for — and what you can safely ignore.

Background

Before cloud computing, organizations managed everything:

🏢

What They Owned

Hardware (servers, switches, storage arrays)
Operating systems and patches
Runtimes, middleware, databases
Applications and data

⚠️

The Cost of Full Ownership

High operational complexity
Constant maintenance overhead
Slow development cycles
Large, specialized ops teams

Cloud providers introduced service models to reduce this burden gradually — letting teams choose exactly how much infrastructure complexity they want to own.

Problems It Solves

❓

Unclear Ownership

Without a model, users don't know what they're responsible for — security gaps emerge

🐌

Slow Development

Developers waste time provisioning infra instead of writing code

💸

Wasted Ops Effort

Teams hand-hold infrastructure that providers can operate at massive scale for a fraction of the cost

🎯

Wrong Tool for the Job

Picking a wrong model means over-managing simple apps or under-controlling complex ones

Core Concept

Three models — IaaS, PaaS, and SaaS — each offer a different level of abstraction. The higher the model, the less you manage.

🔩

IaaS

Infrastructure as a Service

Raw compute, storage, networking
You manage OS upward

🏗️

PaaS

Platform as a Service

Runtime + OS managed for you
You manage code & data

📦

SaaS

Software as a Service

Fully managed application
You configure & use it

Mental Model — Housing Options

Think of the three models as different housing arrangements:

Model	Housing Analogy	What You Handle
IaaS	🏚️ Empty apartment — four walls, utilities connected	Furniture, appliances, decorating, cleaning — everything inside
PaaS	🛋️ Furnished apartment — furniture and appliances included	Just bring your belongings; don't worry about pipes or wiring
SaaS	🏨 Hotel room — fully serviced, front desk on call	Unpack your suitcase; use the room; someone else cleans it

Concept Diagram — Who Manages What

Layer ownership across IaaS · PaaS · SaaS

How It Works

Provider builds infraData centers, networking, hypervisors — all managed at massive scale.

User picks a modelIaaS for control, PaaS for speed, SaaS for plug-and-play.

Responsibility splitsThe model defines exactly what the user must configure, secure and maintain.

User focuses on their layerCode, data, configuration — not hardware or OS patches.

Provider handles the restUptime, hardware failures, scaling of the lower stack.

Real-World Usage

🏭

Enterprises → IaaS

Legacy app migrations
Full control over OS & security baseline
Hybrid cloud bridging

🚀

Startups → PaaS

Ship fast, skip infra setup
Focus 100% on product code
Auto-managed runtimes & DBs

👤

Everyone → SaaS

Email, CRM, HR, collaboration
No IT overhead
Browser or mobile app access

Connection to AWS

AWS spans all three models:

Model	AWS Service	What you manage
IaaS	Amazon EC2	OS, AMI, patches, runtime, app, security groups
IaaS	Amazon S3	Bucket policies, data, lifecycle rules
PaaS	AWS Elastic Beanstalk	App code and config; AWS manages OS, runtime, LB
PaaS	AWS Lambda	Function code only; AWS manages everything else
PaaS	Amazon RDS	Schema, queries, data; AWS manages DB engine & OS
SaaS	Amazon WorkMail / Chime	User accounts & configuration only

Note → Many teams build their own SaaS products on top of AWS IaaS/PaaS. AWS provides the infrastructure; the team is the SaaS provider to their own customers.

IaaS — Deep Dive

IaaS provides raw building blocks — virtual machines, storage, networking — with maximum flexibility and maximum responsibility.

✅

You Manage

Operating system & patches
Runtime environment
Middleware & frameworks
Application code
Data & backups

🏢

Provider Manages

Physical hardware & data center
Hypervisor & virtualization
Network fabric & switches
Hardware failure & replacement

When to use IaaS: you need full control (custom OS hardening, legacy apps, specific kernel tuning), or you're migrating on-prem workloads with minimal changes.

Deep dive → Amazon EC2 (the canonical IaaS service)

PaaS — Deep Dive

PaaS hands you a ready-to-code platform — the OS, runtime, and scaling are handled. You push code, the platform runs it.

✅

You Manage

Application code & logic
Data & schemas
Environment configuration

🏢

Provider Manages

OS installation & patching
Runtime & SDK versions
Load balancing & scaling
Infrastructure provisioning

When to use PaaS: you want to ship fast and don't need to tune the OS or runtime. Typical for new web apps, APIs, microservices, and event-driven functions.

SaaS — Deep Dive

SaaS delivers a fully managed application over the internet. Open a browser, log in, use it. The provider operates everything underneath.

✅

You Manage

User accounts & access control
Application-level configuration
Your own data (content)

🏢

Provider Manages

Application code & features
Runtime, OS, hardware
Uptime, updates, security patches
Data storage & backups

When to use SaaS: you need a capability (email, CRM, monitoring) and building it in-house isn't core business. Use the service, not the stack.

Full Comparison

Dimension	IaaS	PaaS	SaaS
Control level	High	Medium	Low
Your responsibility	OS, runtime, app, data	App & data only	Configuration & usage
Time to first deploy	Hours–days (infra setup)	Minutes–hours	Minutes (sign up)
Flexibility	Maximum — any OS, any config	Constrained by platform	Vendor's feature set only
Security ownership	You own most of the stack	Shared — infra secured by provider	Provider secures infra; you own data classification
AWS examples	EC2, S3, VPC	Lambda, Beanstalk, RDS	WorkMail, Chime, Amazon Connect

Common Misunderstandings

Myth	Reality
"PaaS removes all responsibility."	You still own your application code and data. If your code has a SQL injection, PaaS won't save you.
"IaaS is better because you have more control."	More control = more work. IaaS is right when you need that control — not as a default.
"SaaS is only for non-technical users."	Teams use SaaS tools (GitHub, Datadog, Snowflake) for critical engineering workflows daily.
"These models are mutually exclusive."	Most architectures mix them. A SaaS app might use IaaS for compute, PaaS for its DB, and third-party SaaS for logging.

Summary

📋 Service Models — Recap

IaaS, PaaS, SaaS define how far up the stack the provider manages for you.
IaaS (EC2, S3) — maximum flexibility, you manage OS and above.
PaaS (Lambda, Beanstalk, RDS) — platform handled, you manage code & data.
SaaS — fully managed app, you configure and use it.
Most real architectures mix all three models.
The right model is the least infrastructure you need to meet your requirements.

👉 Key Takeaway

The higher the abstraction, the less you manage — and the more the cloud provider handles. Pick the model that matches your acceptable responsibility level, not just your comfort zone.

Chapter Three

Virtualization & Hypervisors

Virtualization allows a single physical machine to run multiple independent systems simultaneously — turning raw hardware into flexible, multi-tenant infrastructure.

Without virtualization, AWS could not run millions of isolated customer workloads on shared hardware. Every EC2 instance you launch is a virtual machine. Understanding how virtual machines work is understanding the foundation of cloud compute.

Background & History

Before virtualization, applications ran directly on dedicated physical servers — a model known as bare-metal computing.

🖥️

Traditional Setup

One server → one application
Hardware heavily underutilized (10–20% capacity typical)
Scaling meant buying & racking new physical machines
Deployment cycles measured in weeks

📈

The Growing Problem

Data centers ballooned in size and cost
Managing thousands of heterogeneous servers was a nightmare
Peak load required dedicated hardware sitting idle the rest of the time
Applications couldn't be easily moved between machines

IBM pioneered virtualization in the 1960s on mainframes. It became mainstream in the 2000s when VMware brought it to commodity x86 hardware — and it became the bedrock of modern cloud infrastructure.

Problems It Solves

📉

Low Hardware Utilization

Servers idling at ~15% CPU — with virtualization, that same machine runs 10+ VMs at high utilization

💸

High Infrastructure Cost

One physical machine per app is expensive. VMs let you pack many workloads onto the same hardware

🐌

Scaling Difficulty

Adding capacity used to mean a hardware procurement cycle. VMs can be spun up in seconds

🔒

Lack of Isolation

Without VMs, one misbehaving app could crash others. VMs give hard process and memory boundaries

Core Concept

A hypervisor sits between physical hardware and virtual machines, dividing resources and isolating each VM from the others.

🖥️

Physical Host

Real CPU, RAM, disk, NIC
The actual hardware in the data center
"Host" in virtualization vocabulary

⚙️

Hypervisor

Software layer managing VMs
Allocates CPU slices, RAM, disk I/O
Enforces isolation between VMs

📦

Virtual Machine (VM)

Full OS + applications inside a software envelope
Sees virtualized hardware (vCPU, vRAM, vDisk)
"Guest" in virtualization vocabulary

Mental Model — Apartment Building

Think of a physical server as an apartment building:

Real World	Virtualization
🏢 The building itself	Physical server (CPU, RAM, disk)
🏠 Each individual apartment	Virtual machine (isolated OS + apps)
👷 Building manager	Hypervisor (allocates space, enforces rules)
👥 Tenants sharing the building	Multiple VMs sharing hardware
🔐 Locked apartment doors	VM isolation — one VM can't see another's memory
🔧 Utilities (water, power, internet)	Shared hardware resources (CPU cycles, RAM, network)

Each tenant has their own space and doesn't interfere with neighbours — even though they share the same building's infrastructure.

Concept Diagram

Physical Server → Hypervisor → Multiple Virtual Machines

Hypervisor Types

There are two classes of hypervisor, differing in where they sit relative to the host OS:

TYPE 1 — BARE-METAL

Runs directly on hardware

No host OS between hypervisor and hardware
Lower overhead → better performance
Used in production & cloud data centers
Examples: VMware ESXi, Microsoft Hyper-V, AWS Nitro, Xen

TYPE 2 — HOSTED

Runs on top of a host OS

Host OS layer between hypervisor and hardware
Easier to install → popular for dev & testing
Higher overhead than Type 1
Examples: VirtualBox, VMware Workstation, Parallels

Aspect	Type 1 (Bare-metal)	Type 2 (Hosted)
Sits on	Hardware directly	Host operating system
Performance	High — minimal overhead	Lower — extra OS layer
Security isolation	Strong	Weaker (host OS is attack surface)
Primary use	Production clouds, data centers	Developer laptops, testing
Cloud relevance	This is what AWS uses	Not used in cloud providers

How It Works

Install hypervisorBare-metal hypervisor boots directly on the physical server — no OS in between.

Divide resourcesCPU cores, RAM, and disk I/O are partitioned into pools the hypervisor can allocate.

Create VMsEach VM is assigned a slice: vCPUs, vRAM, virtual NIC, virtual disk.

Boot guest OSEach VM boots its own OS independently — Linux, Windows, whatever — unaware of other VMs.

Run appsApplications inside the VM behave exactly as if they're running on physical hardware.

Real-World Usage

☁️

Cloud Providers

AWS, Azure, GCP run billions of VMs
Multi-tenancy is only possible with hypervisor isolation

🏢

Enterprise Data Centers

Server consolidation — 10 physical servers → 1 host with 10 VMs
Live migration for zero-downtime maintenance

💻

Developer Environments

Run different OSes on one laptop (VirtualBox, Parallels)
Reproducible testing across OS versions

Connection to AWS

Every Amazon EC2 instance is a virtual machine. When you click "Launch instance" in the AWS console, the hypervisor on a physical server in an AWS data center carves out a VM for you in seconds.

🔩

AWS Nitro Hypervisor

AWS's custom Type-1 hypervisor (based on KVM)
Offloads I/O to dedicated Nitro cards (NVMe, networking)
Near bare-metal performance — almost no overhead
Released 2017; now powers all modern EC2 instances

🔐

Isolation Guarantee

Each customer's VMs are isolated from others on the same host
Memory is scrubbed between customers
Nitro Controller enforces hardware-level security boundaries
Basis of AWS's multi-tenant security model

When you launch an EC2 instance, AWS:

Selects a physical host with spare capacity in the chosen AZ
Nitro hypervisor allocates the requested vCPUs, RAM, and EBS/NVMe storage
The instance boots your selected AMI (Amazon Machine Image — OS snapshot)
Your VM is fully isolated from every other customer on that same physical host

Go deeper → Amazon EC2 — virtual machines on demand

Common Misunderstandings

Myth	Reality
"VMs are fake computers."	VMs behave like real machines. They have full OS control, networking, storage, and can run any software a physical machine can.
"Each VM gets its own dedicated hardware."	Resources are shared and scheduled by the hypervisor. CPU time is multiplexed; RAM is allocated but pooled across the host.
"Virtualization only exists in the cloud."	Virtualization existed in data centers and developer machines for decades before cloud. Cloud added APIs, billing, and scale on top.
"Containers are the same as VMs."	Containers share the host OS kernel; VMs include a full guest OS. VMs are stronger isolation; containers are lighter-weight.
"The hypervisor adds no overhead."	Modern hypervisors (especially AWS Nitro) are near-zero overhead — but there's always a small cost for resource scheduling and isolation enforcement.

Summary

📋 Virtualization — Recap

Virtualization runs multiple independent VMs on a single physical machine.
The hypervisor manages resource allocation and enforces isolation between VMs.
Type 1 (bare-metal) hypervisors run directly on hardware — used by all cloud providers.
Type 2 (hosted) hypervisors run on a host OS — used for developer machines.
Amazon EC2 instances are VMs powered by AWS's custom Nitro hypervisor.
Virtualization enables the multi-tenancy, scalability, and isolation that make cloud economically viable.

👉 Key Takeaway

Virtualization is what turns physical hardware into flexible, scalable cloud infrastructure — every EC2 instance you launch is a VM created by a hypervisor in milliseconds.

Chapter Four

AWS Global Infrastructure — Regions & Availability Zones

Cloud computing is not just about what runs — it's about where it runs. AWS's global infrastructure lets applications operate across multiple geographic locations, ensuring high availability, low latency, and fault isolation.

Before picking a single service in AWS you answer one question: which region? That choice determines latency for your users, data sovereignty compliance, and what disaster recovery options you have. This page explains the geography under every AWS workload.

Background

🏢

Traditional Architecture

Applications ran in a single data center
One data center failure = total outage
Global reach required building & operating DCs in each country
Disaster recovery was expensive and rarely tested

⚠️

The Problems

High operational cost of each additional DC
Complex network inter-connects between owned facilities
Users far from the DC experienced high latency
Regulatory/data-residency compliance was manual

Cloud providers built globally distributed infrastructure to solve these problems at scale — letting any customer get multinational reach without owning a single building.

Problems It Solves

❌

Single Point of Failure

Multi-AZ and multi-region deployments mean no single location failure can take down a properly designed system

🌍

Global Latency

Deploy to the region closest to your users — shave 100+ ms off response times for overseas traffic

⚖️

Data Sovereignty

Keep data inside a specific country or continent to comply with GDPR, PDPA, or domestic regulations

🔄

Disaster Recovery

Replicate workloads across regions — if one is unavailable, traffic fails over automatically

Core Concept

AWS global infrastructure has three nested layers: Regions → Availability Zones → Edge Locations. Each layer adds a dimension of resilience and performance.

33+

Regions

105+

Availability Zones

600+

Edge Locations & PoPs

245

Countries & Territories Served

🌐

Region

A named geographic area (e.g., us-east-1, ap-southeast-1)
Completely independent — failures don't cross region boundaries
Contains ≥ 3 Availability Zones
Most AWS services are region-scoped

🏢

Availability Zone (AZ)

One or more discrete data centers within a region
Physically separate (km apart) — different power, cooling, networking
Connected by low-latency private fiber (<2ms between AZs)
Named us-east-1a, us-east-1b, etc.

📡

Edge Location

Points of Presence (PoPs) distributed in 90+ cities globally
Used by CloudFront CDN, Route 53, AWS Shield
Caches content and performs DNS resolution close to end users
Not for running EC2/RDS — for delivery & caching only

Mental Model — Cities, Buildings & Post Offices

Think of AWS infrastructure as a global network of cities:

Real World	AWS Infrastructure	Example
🌍 Country / Continent	Global AWS infrastructure	All of AWS worldwide
🏙️ City	Region	Singapore (`ap-southeast-1`)
🏢 Building district in the city	Availability Zone	`ap-southeast-1a`, `ap-southeast-1b`
📬 Local post office / delivery hub	Edge Location	CloudFront PoP in Mumbai
🔐 Fire in one building doesn't spread to others	AZ failure isolation	AZ-a down; AZ-b & AZ-c keep running

Concept Diagram

AWS Global Infrastructure — Regions · AZs · Edge Locations

How It Works

Pick a regionYou choose where to deploy — us-east-1, ap-southeast-1, etc. — based on user proximity and compliance.

Deploy across ≥ 2 AZsYour EC2 instances, RDS replicas, and load balancer nodes span multiple AZs in that region.

AZ failure absorbedIf one AZ loses power or connectivity, the load balancer routes only to healthy AZs — users see nothing.

Edge cachingCloudFront caches static assets at the nearest Edge Location — your users hit a PoP in their city, not your origin server.

Cross-region DR (optional)Replicate data to a second region. If Region A fails, Route 53 health checks flip DNS to Region B automatically.

Real-World Usage

🛒

E-Commerce

Multi-AZ RDS for zero-downtime DB failover
Auto Scaling groups span 3 AZs
CloudFront for product images & static assets

🎬

Streaming Platforms

Origin in one region, CloudFront PoPs globally
S3 as source for CDN — 99.999999999% durability
Route 53 latency routing for API calls

🏦

Financial Systems

Active-active multi-region for RPO/RTO near zero
Data replicated synchronously within region, async across
Data sovereignty enforced by region choice

Connection to AWS Services

Every service you use in AWS has a geographic scope. Knowing the scope tells you what happens during a failure:

Service	Scope	What this means
Amazon EC2	AZ-level	An instance lives in one AZ. Deploy in multiple AZs for HA.
Amazon RDS Multi-AZ	Region (spans AZs)	Primary in one AZ, standby in another. Automatic failover <60s.
Amazon S3	Region (stored across ≥3 AZs)	Eleven 9s durability — survives any single AZ failure.
Elastic Load Balancer	Region (nodes in each AZ)	Distributes traffic across AZs automatically.
Amazon CloudFront	Global (Edge Locations)	Caches at 600+ PoPs — closest possible to the end user.
Amazon Route 53	Global	DNS with health checks — routes around failures automatically.
IAM	Global	Not region-specific — one IAM policy applies everywhere.

Key Design Principles

💥

Design for failure

Assume any AZ can fail at any time. Your architecture should absorb that without intervention.

🔀

Spread across AZs

Always deploy to ≥ 2 AZs. For production: 3. Single-AZ deployments have no HA.

📍

Choose region for users

Deploy in the region closest to your primary users. 50ms RTT vs 200ms RTT matters for UX.

⚖️

Respect data residency

Data never leaves a region unless you explicitly configure it to. Required for GDPR, PDPA, etc.

🔄

Multi-region for DR

Rare but devastating: an entire region can go offline. Multi-region = business-critical resilience.

🚀

Use edge for delivery

CloudFront + S3 origins reduce load on your compute and slash global latency for static assets.

Common Misunderstandings

Myth	Reality
"A Region is a single data center."	A region contains at least 3 physically separate Availability Zones, each of which can be multiple data centers.
"One AZ is enough for high availability."	Single-AZ is a Single Point of Failure. AWS's SLAs for multi-AZ services assume you're using multiple AZs.
"AZs are just different rooms in one building."	AZs are kilometres apart, on separate power grids with separate networking. A natural disaster or power outage affecting one AZ will not affect another.
"Multi-region is always required."	Most applications only need multi-AZ. Multi-region is for disaster recovery and global latency — it adds real operational complexity and cost.
"Edge Locations are the same as AZs."	Edge Locations only run CloudFront, Route 53, and Shield. You cannot deploy EC2 or databases there. They are delivery nodes, not compute regions.

Summary

📋 Regions & AZs — Recap

AWS infrastructure has three layers: Regions → AZs → Edge Locations.
A Region is an independent geographic area; data stays there unless you replicate it out.
An AZ is one or more separate data centers in a region, connected by low-latency fiber.
Edge Locations are CloudFront PoPs — for delivery and caching, not compute.
Always deploy across ≥ 2 AZs. Use 3 for production workloads.
Multi-region is optional — use it for DR requirements or global latency-sensitive apps.
Understanding geographic scope (AZ / Region / Global) is required to reason about any AWS service's failure modes.

👉 Key Takeaway

High availability in the cloud comes from distributing systems across multiple Availability Zones — and optionally across Regions for disaster recovery. Geography is an architectural decision, not an afterthought.

Chapter Five

Shared Responsibility Model

The Shared Responsibility Model answers one foundational question:
Who is responsible for what in the cloud?

It's the most important security concept to internalise before you use a single AWS service. Misunderstanding it is the source of the majority of real-world cloud security incidents — not because AWS failed, but because the customer didn't know what they needed to secure.

Background

🏢

On-Premise: Full Ownership

You own every layer: physical rack, OS, network, app, data
Full control = full accountability
Security team patches hardware, applies firmware, monitors everything
Expensive, but the responsibility boundary is clear: it's all yours

❓

Cloud: The New Question

AWS manages the data center, hardware, and hypervisor
But where does AWS's job end and yours begin?
The answer differs by service type (IaaS vs PaaS vs SaaS)
Without a model, gaps form — and attackers exploit gaps

Responsibility is not eliminated by moving to cloud — it is shared and redistributed depending on which services you use.

Problems It Solves

🕳️

Security Gaps

When nobody knows who owns a layer, nobody secures it. The model eliminates ambiguity

🤷

Unclear Accountability

After a breach: "Was it AWS or us?" The model gives a precise answer for any incident

⚙️

Misconfiguration Risk

Public S3 buckets, open security groups, unencrypted data — all user-layer problems the model flags as your responsibility

📋

Compliance Clarity

Auditors ask "who controls what?" — the model gives you the exact answer for your compliance documentation

Core Concept

AWS is responsible for security of the cloud — the physical and virtual infrastructure. You are responsible for security in the cloud — what you deploy and configure on top of it.

⬇ Shared Responsibility Boundary ⬇

Your Responsibility — Security IN the Cloud

What you own & must secure

Customer data (encryption at rest & in transit)
Identity & Access Management (IAM users, roles, policies)
Operating system on EC2 (patches, hardening)
Application code & runtime configuration
Network & firewall rules (Security Groups, NACLs)
Client-side encryption & data integrity
Platform, applications, identity management

AWS Responsibility — Security OF the Cloud

What AWS owns & secures

Physical data center security (guards, biometrics, CCTV)
Hardware (servers, storage, networking equipment)
Host operating system & virtualization layer (Nitro)
Global network infrastructure (fibre, routers, DDoS)
Managed service software (RDS DB engine, Lambda runtime)
Availability Zone & region fault isolation design
AWS hardware & global infrastructure compliance (SOC 2, ISO 27001)

Mental Model — Secure Apartment Building

Think of cloud infrastructure as a secure apartment building:

Building (AWS)	Apartment (Your workload)
🏢 Guards at the front entrance	🔑 You lock your apartment door
🔐 Secured lifts and common areas	🪟 You close your windows
💡 Electricity and utilities managed	👤 You control who has your key
🔧 Building structure maintained	🧹 You keep your own space tidy
📹 CCTV on the street outside	🚨 You configure your own alarm inside

Key insight → Even in the most physically secure building in the world, your apartment can be broken into if you leave the door unlocked. The same is true in AWS.

Concept Diagram

Shared Responsibility — layer ownership from hardware to application

How It Works

AWS secures the foundationData centers, physical hardware, global network, and the hypervisor — all under AWS's control and compliance certifications.

AWS ensures service availabilitySLAs, multi-AZ redundancy, managed software updates for RDS, Lambda, etc.

You deploy your workloadEC2 instances, databases, Lambda functions — configured by you in your account.

You configure securityIAM policies, Security Groups, encryption settings, S3 bucket policies, OS patches — entirely your responsibility.

Both layers combineA secure system requires both layers working. AWS can't protect you from a public S3 bucket. You can't protect against a data-center intrusion — but AWS already does.

Real-World Usage

The majority of cloud security incidents fall on the customer side of the boundary. Common root causes:

🪣

Public S3 Buckets

AWS provides the bucket; you set the ACL
Misconfigured public access has exposed millions of records
Fix: S3 Block Public Access + bucket policies

🔑

Exposed IAM Keys

AWS secures the IAM service; you manage the keys
Hardcoded credentials in GitHub repos is a user error
Fix: IAM roles, Secrets Manager, no long-lived keys

🖥️

Unpatched EC2

AWS provides the hypervisor and hardware; you patch the OS
EC2 instances running 6-month-old kernels are your problem
Fix: Systems Manager Patch Manager, IMDSv2

Connection to AWS Services

Service	AWS Secures	You Secure
Amazon EC2	Host OS, hypervisor, hardware, data center	Guest OS patches, Security Groups, IAM instance profile, app code
Amazon S3	Storage infrastructure, 11-9s durability, hardware redundancy	Bucket policies, ACLs, Block Public Access, KMS encryption, versioning
Amazon RDS	DB engine installation, OS patching, hardware, Multi-AZ failover	DB users & passwords, security group rules, parameter groups, data encryption
AWS Lambda	Runtime, underlying infra, function isolation, scaling	Function code, execution role (IAM), environment variable secrets
Amazon VPC	Physical network, transit infrastructure	Subnets, route tables, Security Groups, NACLs, internet gateway configs
IAM	IAM service availability	Every policy, role, user, group, permission boundary — entirely yours

Responsibility Shifts with the Service Model

As you move from IaaS → PaaS → SaaS, your security surface shrinks — but it never disappears:

IaaS · e.g. EC2

Customer data YOU

Application YOU

OS + patches YOU

Network config YOU

Virtualization AWS

Hardware AWS

Data center AWS

PaaS · e.g. RDS / Lambda

Customer data YOU

Application code YOU

OS + patches AWS

Runtime AWS

Virtualization AWS

Hardware AWS

Data center AWS

SaaS · e.g. WorkMail

Data & config YOU

User access YOU

Application AWS

OS + runtime AWS

Virtualization AWS

Hardware AWS

Data center AWS

Recap → Cloud Service Models (IaaS, PaaS, SaaS)

Common Misunderstandings

Myth	Reality
"AWS handles all security."	AWS secures the infrastructure. Your applications, IAM policies, and data configurations are entirely your responsibility.
"If data is in the cloud, it's automatically safe."	Data safety depends on your encryption, access controls, and logging config. Misconfigured S3 buckets with sensitive data have caused massive real-world breaches.
"Using managed services removes responsibility."	PaaS and SaaS reduce your attack surface — they don't eliminate it. You still own your data, IAM roles, and application logic.
"AWS compliance certifications cover my workload."	AWS's SOC 2, ISO 27001, etc. cover their infrastructure. For your workload to be compliant, you must implement the required controls on your side of the boundary.
"The boundary is always the same."	The boundary shifts with the service model. On EC2 (IaaS) you own the OS. On RDS (PaaS) you don't. The model must be evaluated per service.

Summary

📋 Shared Responsibility — Recap

Cloud security is shared — AWS protects infrastructure, you protect what you build on top.
AWS is responsible for security of the cloud: hardware, data centers, hypervisor, global network.
You are responsible for security in the cloud: OS, data, IAM, application code, network configs.
The boundary shifts by service model — on EC2 you own the OS; on RDS you don't.
Most real-world cloud security incidents are customer-side failures: public S3 buckets, exposed keys, unpatched OS.
AWS compliance certs cover AWS's side. Your workload compliance is your job.
Higher abstraction (PaaS/SaaS) reduces your surface — but never to zero.

👉 Key Takeaway

Cloud security is a shared effort — AWS secures the foundation, but you are fully responsible for what you build, configure, and deploy on top of it. Never assume the cloud provider handles it all.

Chapter Six

Cloud Design Principles & Well-Architected Framework

Building in the cloud isn't just about picking services — it's about designing systems that are reliable, scalable, secure, and cost-efficient under real-world conditions.

Any engineer can launch an EC2 instance. Far fewer design a system that handles 10× the expected traffic, survives an AZ failure, stays within budget, and keeps operations teams from being paged at 3 am. That's what cloud design principles enable.

Background

🏢

Traditional System Design

Tightly coupled monoliths — one failure, total outage
Scaling meant buying bigger hardware
Manual operations, slow deployments
No standard vocabulary for "good design"

⚠️

Cloud Without Principles

Teams reinvent the wheel — and make the same mistakes
Architectures grow organically → brittle, expensive, hard to change
Security bolted on after the fact
Costs spiral because nobody owns them

In 2015, AWS published the Well-Architected Framework — a structured set of guidance for evaluating and improving cloud architectures across five dimensions. It's now the industry-standard vocabulary for cloud design.

Problems It Solves

💥

Fragile Systems

Without reliability principles, a single component failure cascades. Design-for-failure patterns break the cascade

📈

Inefficient Scaling

Vertical scaling hits walls. Horizontal scaling with decoupled components is the cloud-native approach

💸

Unexpected Costs

Over-provisioned resources, always-on dev environments, missing auto-scaling — cost optimization principles address all of them

🔐

Security Gaps

Security as an afterthought leaves holes. Security pillar principles bake it into the design from day one

Core Concept

The AWS Well-Architected Framework evaluates architectures across five pillars: Reliability · Performance Efficiency · Security · Cost Optimization · Operational Excellence. A well-architected system balances all five.

No pillar dominates the others. A system that is perfectly reliable but astronomically expensive is not well-architected. The framework forces you to evaluate trade-offs explicitly rather than optimising one dimension in ignorance of the rest.

Mental Model — Designing a City

Think of cloud architecture like planning a modern city:

City Planning	Cloud Architecture	Well-Architected Pillar
🚦 Traffic management & road redundancy	Multi-AZ load balancing, circuit breakers	Reliability
⚡ Power grid that scales with population	Auto Scaling, serverless compute	Performance Efficiency
🔐 Locks, CCTV, access zones in buildings	IAM least privilege, encryption, VPC isolation	Security
💡 Utilities metered — pay for what you use	Right-sizing, spot instances, savings plans	Cost Optimization
🛠️ City maintenance crews & alert systems	CloudWatch, runbooks, automated remediation	Operational Excellence

Concept Diagram

AWS Well-Architected Framework — Five Pillars Supporting the Workload

How It Works

Define requirementsWhat does the system need to do? What are the SLA, RTO/RPO, latency, and compliance requirements?

Design architectureChoose AWS services and patterns that meet those requirements across the five pillar dimensions.

Apply principlesFor each design decision, ask: does it favour reliability, performance, security, cost, and ops? Make trade-offs explicit.

Run a Well-Architected ReviewAWS offers a formal review tool (WAR) — answer questions per pillar, get a risk report, create improvement plan.

Iterate continuouslyArchitecture is never done. Every incident is a design signal. Every new AWS service is a potential improvement.

Real-World Usage

🌐

High-Traffic Web Apps

Multi-AZ ALB + Auto Scaling (Reliability)
CloudFront for global latency (Performance)
WAF on the load balancer (Security)

📦

Microservices

Decoupled via SQS/SNS (Reliability)
Independent scaling per service (Performance)
Service-specific IAM roles (Security)

📊

Data Pipelines

S3 checkpointing for fault tolerance (Reliability)
Spot instances for batch jobs (Cost)
VPC endpoints — no public internet (Security)

Connection to AWS

AWS provides first-party tooling to operationalise these principles:

📋

AWS Well-Architected Tool

Free from the AWS console
Structured questionnaire per pillar
Identifies High / Medium / Low risks
Generates an improvement plan with AWS guidance links
Can be run during design and post-deploy

🤝

AWS Well-Architected Partner Program

AWS partners (consultants, SIs) can run formal reviews
Architecture deep-dives per workload type
Lenses available: SaaS, IoT, ML, Serverless, Analytics
Result: signed-off architecture review document

Tip → Run a Well-Architected Review before a project launches and again after 3 months of production. The cost of fixing design issues post-launch is 10–100× more expensive than catching them early.

Key Design Principles

💥

Design for Failure

Assume every component will fail — servers, AZs, even entire regions. Auto-recovery should be the default, not the exception.

🔗

Decouple Components

Tight dependencies mean cascading failures. Use SQS, SNS, and API boundaries so services can fail and recover independently.

↔️

Scale Horizontally

Add more small instances instead of one giant instance. Horizontal scaling is elastic, resilient, and matches cloud economics.

🤖

Automate Everything

Infrastructure-as-Code (CDK, CloudFormation, Terraform). CI/CD pipelines. Automated scaling. Remove humans from the critical path.

📊

Monitor & Observe

You can't fix what you can't see. CloudWatch metrics and alarms, distributed tracing (X-Ray), structured logs (CloudWatch Logs Insights).

🛡️

Security by Default

Least privilege from day one, not retrofitted. Deny-all IAM starting point. Encryption at rest and in transit for everything.

🔄

Implement Elasticity

Resources should grow with demand and shrink when idle. Serverless (Lambda) and Auto Scaling are the canonical implementations.

💰

Match Cost to Value

Tag resources, set budgets, use Cost Explorer. Identify top 3 cost drivers every month and challenge whether they're justified.

Well-Architected Pillars — Deep Dive

🛡️ Reliability

A system's ability to recover from failures and continue to function correctly over time.

Design with quotas and limits in mind
Deploy across ≥ 2 AZs for every stateful component
Use health checks + automatic failover (ELB, Route 53)
Test recovery: chaos engineering, game days
Backup data and test restores regularly

⚡ Performance Efficiency

Selecting and using the right resources in the right amounts efficiently as requirements change.

Use purpose-built compute (GPU for ML, memory-optimised for in-memory DBs)
Cache aggressively at every layer (ElastiCache, CloudFront, DAX)
Go serverless where you can — Lambda, Fargate, Aurora Serverless
Re-evaluate instance types annually as AWS releases new generations

🔐 Security

Protecting data, systems, and assets through risk assessments and mitigation strategies.

Apply least privilege to every IAM entity — start with deny-all
Enable CloudTrail, Config, GuardDuty in every account
Encrypt everything: KMS for data at rest, TLS for data in transit
Use VPC endpoints to keep traffic off the public internet
Rotate credentials; eliminate long-lived access keys

💰 Cost Optimization

Running workloads at the lowest price point without sacrificing performance or reliability.

Right-size instances with Compute Optimizer recommendations
Use Spot for fault-tolerant batch workloads (up to 90% savings)
Purchase Savings Plans or Reserved Instances for steady-state workloads
Shut down non-production environments outside business hours
Delete unattached EBS volumes, stale snapshots, idle load balancers

🛠️ Operational Excellence

Running and monitoring systems, and continually improving processes and procedures.

Define everything as code: infrastructure, pipelines, runbooks
Make small, reversible changes — not infrequent, risky big-bang deploys
Define and measure business KPIs in CloudWatch dashboards
Run post-mortems blameless; capture learnings as action items
Anticipate failure modes with game days and chaos experiments

Best Practices

Deploy critical services across at least 2 Availability Zones
Use Auto Scaling groups for all stateless compute tiers
Enable Multi-AZ for all production databases (RDS, ElastiCache)
Implement automated backups and test restores quarterly
Apply least-privilege IAM with SCPs at the AWS Organizations level
Encrypt data at rest (KMS) and in transit (TLS 1.2+) everywhere
Use managed services (RDS, Lambda, SQS) over self-managed equivalents where possible
Tag every resource: Owner, Environment, CostCenter, Application
Set billing alarms and AWS Budgets for every account
Run infrastructure from code (CloudFormation, CDK, Terraform)
Enable CloudTrail, AWS Config, and GuardDuty in every region and account
Conduct a formal Well-Architected Review before every major launch

Common Misunderstandings

Myth	Reality
"Cloud automatically makes systems scalable."	Cloud gives you scalable primitives. A monolith deployed on EC2 with no Auto Scaling is not scalable just because it's in AWS.
"High performance always means high cost."	Not with the right design. Caching, CDN, right-sizing, and serverless often deliver better performance and lower cost than brute-force compute.
"Best practices are fixed rules to follow blindly."	The framework explicitly says: every principle involves trade-offs. A startup's MVP has different reliability requirements than a banking core system.
"More services = better architecture."	Complexity is a cost. Every additional service adds operational burden and potential failure points. The simplest architecture that meets requirements is the best architecture.
"The Well-Architected Framework only applies to large systems."	Even a personal project benefits from the principles. Cost optimization and security are relevant at any scale.

Summary

📋 Cloud Design Principles — Recap

The AWS Well-Architected Framework provides five pillars: Reliability, Performance, Security, Cost Optimization, Operational Excellence.
Good architecture balances all five — optimising one at the expense of others is an anti-pattern.
Core design patterns: design for failure, decouple, scale horizontally, automate, observe, security-by-default.
Use the AWS Well-Architected Tool (free in console) to formally evaluate your workloads.
Architecture is continuous: review after launch, after incidents, and as AWS releases new services.
Simplicity beats complexity — the best architecture is the simplest one that meets requirements.

👉 Key Takeaway

Good cloud architecture isn't about using more services — it's about applying the right principles to design resilient, efficient, and cost-effective systems. The five pillars are your compass.