LearningTree · AWS · Database

Amazon Aurora —
Cloud-Native Relational Database

Aurora is not just a faster RDS — it is a ground-up redesign of the relational database for the cloud. Decoupled storage and compute, six-way replication across three AZs, MySQL and PostgreSQL compatible, and up to 5× MySQL performance. Aurora is what you choose when RDS is not enough.

⚡ Aurora in 30 Seconds

Cloud-native relational DB — MySQL & PostgreSQL compatible
Shared distributed storage — decoupled from compute, auto-scales to 128 TiB
6 copies of data across 3 AZs — HA is built-in, not an add-on
Up to 15 read replicas — all share the same underlying storage
5× MySQL and 3× PostgreSQL performance vs community editions
Aurora Serverless v2 — compute scales instantly from 0.5 to 128 ACUs

Chapter One

What is Amazon Aurora

The Problem with Traditional Cloud Databases Introductory

When AWS launched RDS, they took existing database engines (MySQL, PostgreSQL, Oracle) and ran them on managed EC2 infrastructure. That solved ops burden — no more patching, backups are handled — but the database architecture itself was unchanged. It was still designed for single-server, spinning-disk-era assumptions.

👉 The root problem: Traditional databases couple storage to compute. Each instance owns its disk. Replication means copying data over a network, constantly. Failover means waiting for a standby to catch up. Storage limits come from the instance. Cloud demands something better.

What is Amazon Aurora Introductory

Amazon Aurora is a cloud-native relational database built from scratch at AWS, designed to resolve the architectural limitations of traditional databases. It is fully MySQL and PostgreSQL compatible — your SQL works, your drivers work, your ORMs work. But underneath, everything is different.

🧬

Cloud-Native Design

Built for distributed cloud storage from scratch. Storage is a distributed, fault-tolerant service — not a single disk attached to a server.

🔄

MySQL & PG Compatible

Aurora MySQL 3.x is compatible with MySQL 8. Aurora PostgreSQL 15.x is compatible with PostgreSQL 15. No SQL rewrites needed.

⚡

5× / 3× Performance

5× throughput vs MySQL community edition. 3× vs PostgreSQL. Achieved through distributed storage, parallel writes, and log-based replication.

Aurora vs RDS — The Single Most Important Distinction Core

Most people think Aurora is just “RDS but faster”. That is wrong and will cost you exam marks. The difference is architectural:

   Aspect RDS (MySQL / PG) Aurora 
  Storage model Instance-local EBS volume Shared distributed storage tier 
 Storage limit 64 TiB max (manual scaling) 128 TiB auto-scales 
 Replication Async binlog (replica copies data) Storage-level (no data moves) 
 HA copies 1 standby (Multi-AZ) 6 copies across 3 AZs (always) 
 Failover time 60–120 seconds <30 seconds 
 Read replicas Up to 5 (async, data copied) Up to 15 (share same storage) 
 Replica lag Milliseconds to seconds Milliseconds (storage-level) 
  

Aspect	RDS (MySQL / PG)	Aurora
Storage model	Instance-local EBS volume	Shared distributed storage tier
Storage limit	64 TiB max (manual scaling)	128 TiB auto-scales
Replication	Async binlog (replica copies data)	Storage-level (no data moves)
HA copies	1 standby (Multi-AZ)	6 copies across 3 AZs (always)
Failover time	60–120 seconds	<30 seconds
Read replicas	Up to 5 (async, data copied)	Up to 15 (share same storage)
Replica lag	Milliseconds to seconds	Milliseconds (storage-level)

Concept Diagram — RDS vs Aurora Storage Model Core

The fundamental difference — RDS local storage vs Aurora shared distributed storage

Mental Model — The Right Way to Think About Aurora Introductory

🧠 Most People Think (Wrong):

“Aurora = RDS with better hardware” or “Aurora = fast MySQL”

✨ Better Mental Model (Correct):

Aurora = Compute nodes + a cloud-native shared storage service. The database engines (writer + readers) are just compute that plugs into one shared storage fabric. The storage itself is distributed, replicated, and elastic — independent of any single instance.

🏠

Old Model (RDS)

DB engine + local disk = one tightly coupled unit
Replicate = copy data to another server's disk
Add replica = duplicate storage cost
Failover = wait for standby to assume its local disk
Like a house: each house has its own pipes

🏗️

New Model (Aurora)

Storage is a separate elastic service
Replicate = storage handles it, engines don't know
Add replica = new compute node, no extra storage
Failover = another compute node grabs the same storage
Like a city water system: each tap connects same pipes

💡

Why It Matters

Faster failover (no data sync needed)
Instant read replicas (no copy of data)
Storage grows automatically (no pre-provisioning)
Lower replication lag (storage-level, not app-level)
Aurora Global Database becomes feasible

Aurora Compatibility — Two Flavours Core

Aurora comes in two variants — pick based on which engine your app uses

Aurora MySQL

Compatible: MySQL 5.7 / 8.0
Aurora MySQL 2.x / 3.x
5× MySQL community performance

Aurora PostgreSQL

Compatible: PG 13 / 14 / 15 / 16
Aurora PG 15.x
3× PostgreSQL community performance

⚠️ Aurora does NOT support Oracle or SQL Server — use RDS for those engines.

AWS Architecture Diagram — Aurora Cluster in a VPC Core

Aurora DB cluster inside VPC — writer + readers across AZs, shared storage underneath

VPC (10.0.0.0/16)

PUBLIC SUBNET

EC2 App

Web server

→

PRIVATE SUBNETS — Aurora Cluster

Aurora Writer

AZ-a • R/W
Cluster endpoint

Reader 1

AZ-b • R only
Reader endpoint

Reader 2

AZ-c • R only
Reader endpoint

🔒 SG: App SG → Aurora SG
🔑 KMS encrypted at rest
💾 Shared storage (6 copies / 3 AZs)
🔄 Auto-scales storage to 128 TiB

Aurora vs RDS — Cost Reality Check Core

💰

Compute Cost (Higher for Aurora)

db.r6g.large (2 vCPU, 16 GB RAM)
RDS MySQL: ~$0.18/hr
Aurora MySQL: ~$0.22/hr (~22% more)
Difference compounds at scale with many instances
Aurora is worth it when architecture features justify the delta

⚖️

Storage Cost (Cheaper at Scale)

Aurora: shared storage — pay once, used by all replicas
RDS: each read replica = full data copy = extra storage cost
15 Aurora replicas: 1× storage cost
15 RDS replicas: 16× storage cost
At 5+ replicas, Aurora total cost is often lower than RDS

When to Choose Aurora over RDS Core

✅

Choose Aurora When

Need higher throughput than RDS MySQL / PG
Need more than 5 read replicas (Aurora supports 15)
Need failover under 30 seconds
Need global distribution (Aurora Global Database)
Need serverless variable workloads (Serverless v2)
Storage >10 TiB or unpredictable growth
Production OLTP requiring highest availability

📌

Stick with RDS When

Need Oracle or SQL Server (Aurora doesn't support them)
Budget is tight (Aurora ~20% higher per instance)
Workload is light — RDS already meets SLAs
Need RDS Custom OS-level access
Legacy app tied specifically to engine minor version
MariaDB (not supported by Aurora)

🧠 Key Insight

Aurora is not RDS with faster hardware — it is a different storage architecture. The shared distributed storage is what enables everything: instant replicas, sub-30s failover, auto-scaling storage, and Global Database. Understand that one idea and the rest of Aurora snaps into place.

Chapter Summary Introductory

 Aurora = cloud-native relational DB — MySQL and PostgreSQL compatible; no Oracle/SQL Server
Decoupled storage: compute and storage are separate — the fundamental design difference from RDS
6 copies / 3 AZs: HA is architectural, not optional — no separate Multi-AZ to configure
5× MySQL / 3× PostgreSQL performance vs community editions
Aurora ≠ fast RDS: the storage architecture is completely different; that is the key exam insight
Mental model: writer + readers are compute nodes plugging into one shared storage system
 

Chapter Two

Aurora Architecture — Shared Storage & Compute Separation

The Aurora Cluster — Two Layers Introductory

Every Aurora deployment is called a DB cluster. A cluster has two completely independent layers: the compute layer (DB instances that run the MySQL or PostgreSQL engine) and the storage layer (the shared distributed volume that all compute instances read and write). These two layers scale independently of each other.

🖥️

Compute Layer (DB Instances)

Writer instance — one per cluster, handles all writes
Reader instances — up to 15, read-only, same data
Each instance is a specific instance class (db.r6g, db.t3, etc.)
Instances can be added or removed without touching storage
Serverless v2 = compute that auto-scales instead of fixed instances

💾

Storage Layer (Cluster Volume)

Single logical volume shared by all compute instances
Physically: 6 copies spread across 3 AZs (2 per AZ)
Stored in 10 GB segments (“protection groups”)
Auto-scales from 10 GB to 128 TiB with zero downtime
You pay only for storage actually used (not pre-provisioned)

How the Storage Volume Works — Segments & Self-Healing Core

The Aurora cluster volume is divided into 10 GB segments called protection groups. Each protection group is replicated six times across three AZs. This granularity means that if a disk fails, only the corresponding segments need to be repaired — not the entire database. Aurora performs this repair continuously in the background, peer-to-peer between storage nodes, without involving the compute layer at all.

👉 Why this matters for HA: Traditional databases repair by replacing the failed node and copying all data back. Aurora repairs individual 10 GB segments, in parallel, across many storage nodes simultaneously. A 1 TB database can repair a failed copy in minutes, not hours.

Concept Diagram — Aurora Cluster Volume Internals Core

Aurora cluster volume — 10 GB segments replicated 6× across 3 AZs

Write Path — How Aurora Commits a Write Core

Aurora uses a quorum-based write model. When the writer commits a transaction, it does not write data pages to storage — it writes only redo log records to the 6 storage nodes. The write is acknowledged when 4 of 6 nodes confirm receipt. Storage nodes reconstruct data pages from log records locally. This is why Aurora writes are so fast — less data moves over the network.

✏️

Write Quorum: 4/6

Writer sends redo log to all 6 nodes
Waits for 4 acknowledgements
Commit confirmed — client gets response
Remaining 2 catch up asynchronously
Can tolerate 2 failed storage nodes without halting writes

📖

Read Quorum: 3/6

Reads need 3 of 6 nodes to agree
Can tolerate 3 failed nodes and still serve reads
Readers also use log records to materialize pages locally
Dramatically reduces replication lag vs binlog

📤

Only Logs, No Pages

Writer sends log records (small), not full data pages (large)
Network I/O reduced by up to 7× vs traditional replication
Storage nodes apply logs locally — no round-trips for page writes
This is the core reason for Aurora's write performance advantage

Cluster Endpoints — How Applications Connect Core

Aurora exposes several DNS endpoints. Knowing which to use for which workload is critical for exam questions:

   Endpoint Type Points To Use For 
  Cluster endpoint Current writer instance All writes — always points to writer after failover 
 Reader endpoint All reader instances (load-balanced) All reads — distributes across readers automatically 
 Custom endpoint A specific subset of instances Analytics workloads on specific high-memory instances 
 Instance endpoint One specific instance Diagnostics, direct maintenance — not for production app 
  

Endpoint Type	Points To	Use For
Cluster endpoint	Current writer instance	All writes — always points to writer after failover
Reader endpoint	All reader instances (load-balanced)	All reads — distributes across readers automatically
Custom endpoint	A specific subset of instances	Analytics workloads on specific high-memory instances
Instance endpoint	One specific instance	Diagnostics, direct maintenance — not for production app

AWS Architecture Diagram — Endpoints in Practice Core

Aurora cluster endpoints — app uses cluster endpoint for writes, reader endpoint for reads

VPC

APP TIER

App Server

Writes → cluster EP
Reads → reader EP

→ writes

→ reads

CLUSTER ENDPOINT (writer)

Writer

AZ-a • R/W
Auto-updates on failover

READER ENDPOINT (load-balanced)

Reader 1

AZ-b

Reader 2

AZ-c

💾 All instances share the same cluster volume — no data duplication per reader
🔄 Cluster endpoint automatically flips to new writer after failover — no app change needed

Storage Auto-Scaling — No Pre-Provisioning Core

📈

How Storage Scales

Starts at 10 GB minimum
Grows in 10 GB increments automatically
Maximum: 128 TiB
No downtime, no instance restart
You pay per GB-month actually used — not for pre-provisioned capacity

💰

Storage Pricing Model

Pay for storage consumed (not allocated)
Storage never shrinks automatically (high-water mark model)
I/O requests billed separately in Aurora Standard
I/O requests free (included) in Aurora I/O-Optimized

Aurora Standard vs I/O-Optimized Core

Feature	Aurora Standard	Aurora I/O-Optimized
Storage rate	Lower (~$0.10/GB-month)	Higher (~$0.225/GB-month)
I/O billing	Per million requests (~$0.20)	Included — free
Best for	Low-to-moderate I/O workloads	High I/O workloads (>25% of bill is I/O)
Exam keyword	Default	“high I/O, predictable costs”

💡 Rule of thumb: if your I/O charges exceed 25% of your total Aurora bill, switch to I/O-Optimized and you will likely save money.

🧠 Key Insight

Aurora writes only redo log records to storage, not full data pages. The quorum model (4/6 for write, 3/6 for read) means the cluster continues operating even when multiple storage nodes fail. Storage grows automatically — you never pre-provision. Adding readers costs no extra storage.

Chapter Summary Introductory

 DB cluster = compute layer (writer + readers) + storage layer (cluster volume) — independent layers
Cluster volume = 6 copies across 3 AZs, 10 GB segments, self-healing, auto-scales to 128 TiB
Writes: redo logs sent to 6 storage nodes, commits on 4/6 ack — no full page writes
Reads: quorum 3/6 — readers materialize pages locally from logs, near-zero lag
Endpoints: cluster (writer), reader (load-balanced), custom (subset), instance (direct)
Storage cost: pay per GB used, not allocated; I/O-Optimized tier for high-I/O workloads
 

Chapter Three

High Availability & Replication

HA is Built Into Aurora — Not an Add-On Introductory

With RDS, you opt into high availability by enabling Multi-AZ, which provisions a separate standby instance. With Aurora, HA is the default state. The 6-copy storage model exists for every Aurora cluster regardless of whether you add readers or not. There is no “single-AZ Aurora” at the storage level.

👉 Critical exam point: You do NOT need to enable Multi-AZ in Aurora — the 6-copy storage replication across 3 AZs is always on. What you control is how many compute instances (readers) you add for faster compute-level failover.

The 6-Copy Replication Model Explained Core

🟢

AZ-a (2 copies)

2 independent copies of every storage segment
Even if both fail — 4 copies remain alive
Typically hosts the writer instance
AZ outage loses 2 copies — writes continue (4/6 quorum met)

🔵

AZ-b (2 copies)

2 independent copies
Hosts reader instances for spread reads
AZ failure here: 4 copies in AZ-a + AZ-c remain
Reads continue, writes continue

🟣

AZ-c (2 copies)

2 independent copies
Full geographic separation from AZ-a and AZ-b
Highest-tier DR coverage: single-AZ outage never threatens writes
Reads continue from AZ-a + AZ-b readers

Quorum Failure Tolerance — How Much Can You Lose Core

   Scenario Copies Lost Writes Reads 
  1 disk fails 1 of 6 ✔ Continue (4/6 quorum) ✔ Continue (3/6 quorum) 
 1 full AZ outage 2 of 6 ✔ Continue (4/6 quorum) ✔ Continue (3/6 quorum) 
 2 disks fail (diff AZs) 2 of 6 ✔ Continue ✔ Continue 
 3 disks fail 3 of 6 ❌ Halted (need 4) ✔ Continue (3/6 quorum) 
 4+ disks fail 4+ of 6 ❌ Halted ❌ Halted 
  

Scenario	Copies Lost	Writes	Reads
1 disk fails	1 of 6	✔ Continue (4/6 quorum)	✔ Continue (3/6 quorum)
1 full AZ outage	2 of 6	✔ Continue (4/6 quorum)	✔ Continue (3/6 quorum)
2 disks fail (diff AZs)	2 of 6	✔ Continue	✔ Continue
3 disks fail	3 of 6	❌ Halted (need 4)	✔ Continue (3/6 quorum)
4+ disks fail	4+ of 6	❌ Halted	❌ Halted

Concept Diagram — 6-Copy 3-AZ Distribution Core

Aurora HA — 6 copies of every storage segment across 3 AZs (AZ outage ≠ data loss)

Automatic Failover — Compute Level Core

When the writer instance fails, Aurora promotes one of the existing reader instances to become the new writer. Because storage is shared, the promoted reader already has the complete dataset — it just switches its mode. This is why Aurora failover is so much faster than RDS.

⏱️

Failover Timeline

Writer failure detected: ~10–20 seconds
Reader promoted to writer: immediate (no data copy)
DNS updated to point to new writer
Applications reconnect: total ~30 seconds
With Aurora readers present: typically <30 seconds
Without readers (single instance): ~60–120 seconds (new instance launched)

🏆

Failover Priority Tiers

Each reader has a priority tier: 0 (highest) – 15 (lowest)
Aurora promotes the reader at the highest priority tier
Tie in priority: promotes the largest instance first
Second tie: promotes by instance ID alphabetically
Set tier via console / CLI — use tier 0 for your primary DR reader

Failover Flow Diagram Core

Aurora failover — reader promoted instantly (same storage, no data copy needed)

AZ-a

Writer ❌ FAILED

Hardware / host failure

AZ-b

Reader → Writer ✔

Priority tier 0
Promoted instantly

AZ-c

Reader ✔ Continues

Still serving reads
No interruption

① Writer failure detected (~10–20s)
② Highest-priority reader selected for promotion
③ Reader promotes — already has full data in shared storage (no copy!)
④ Cluster DNS endpoint flips → new writer
⏱️ Total: <30 seconds with readers present

Aurora vs RDS — Failover Comparison Core

⏳

RDS Multi-AZ Failover

Standby is in a separate AZ with its own EBS
Data was synchronously replicated, but standby still needs to“take over” its volume
DNS updated, OS mounts change — process takes time
Failover: 60–120 seconds
Only 1 standby — one chance for failover

⚡

Aurora Failover

Reader already shares the cluster volume
Promotion = change compute role, no data handoff
Up to 15 readers — any can become writer
Priority tiers control which one is chosen
Failover: <30 seconds (with readers)

Aurora Multi-Master — Multiple Writers Advanced

✏️

What is Multi-Master

Up to 4 writer nodes in a single Aurora MySQL cluster
All nodes accept writes simultaneously
Conflict resolution handled at storage layer
No read replicas in multi-master mode
Single-master covers 99% of production use cases

📌

When to Consider It

Rarely needed — most HA/performance needs met by single-master + replicas
Use case: apps requiring write continuity during writer failover without a pause
Not a replacement for sharding or distributed databases
Supported: Aurora MySQL only (not PostgreSQL)
Exam: almost always refers to single-master; multi-master is niche

Self-Healing Storage — Continuous Repair Advanced

🔧

How Self-Healing Works

Storage nodes continuously monitor each other
When a node or disk detects data corruption or failure, peer nodes donate segments to repair it
Repair is parallel across many segment pairs simultaneously
10 GB per segment = fast repair (not terabytes at once)
Completely transparent to compute (writer + readers)

📊

Impact on Availability

Aurora can lose 1 copy and be below quorum resilience threshold — repair begins immediately
Mean time to repair (MTTR): minutes for typical segments
Dramatically lowers dual-failure probability
No human intervention required
Aurora tracks unhealthy segments and prioritises their repair

🧠 Key Insight

Aurora HA is architectural, not operational. 6-copy quorum storage means a full AZ outage never loses data. Compute failover is fast (<30s) because promoted readers already share the storage. Self-healing continuously restores the 6-copy redundancy without you doing anything.

Chapter Summary Introductory

 6 copies / 3 AZs: always on — not optional; lose a full AZ and writes continue (4/6 quorum)
Write quorum: 4/6 — can lose 2 storage nodes and still write
Read quorum: 3/6 — can lose 3 storage nodes and still read
Compute failover: <30s — reader promotes because it already has the data
Priority tiers 0–15: control which reader becomes writer on failover
Self-healing storage: peer-to-peer segment repair, continuous, transparent, minutes MTTR
 

Chapter Four

Scaling & Read Replicas

Read Scaling — Up to 15 Replicas, Zero Storage Cost Introductory

Aurora supports up to 15 read replicas per cluster — three times more than RDS. Because all replicas share the same underlying cluster volume, adding a replica means provisioning new compute only. No data is copied. No extra storage cost per replica. The reader is live and serving traffic within minutes.

👉 Key exam insight: Aurora read replicas share the same storage as the writer. Adding a 15th replica costs the same as adding the 1st — just the compute instance. With RDS, every read replica is an independent database that holds a full copy of the data, which means you pay for storage per replica.

🔢

Replica Limits

Up to 15 read replicas per Aurora cluster
All replicas share the same cluster volume
Each replica is in its own AZ (recommended) or same AZ
Each has its own instance endpoint
All served via the single reader endpoint (load-balanced)

📉

Replication Lag

Storage-level replication — not binlog
Typically <100 ms behind writer
Much lower than RDS async replication
Replicas get the same log records as the storage layer
Exam: Aurora replica lag ≈ milliseconds; RDS lag ≈ seconds

💰

Cost Advantage

No extra storage per replica (shared volume)
Pay only for the compute instance class
Can use smaller instance for read-only workloads
Scale down replicas during off-peak (or use Serverless v2)
RDS: each replica = full data copy = double/triple storage cost

Concept Diagram — Write vs Read Distribution Core

Aurora read scaling — writes to writer, reads spread across up to 15 readers via reader endpoint

Reader Endpoint — Automatic Load Balancing Core

The Aurora reader endpoint is a single DNS address that automatically distributes incoming connections across all available reader instances using connection-level load balancing. You point your read traffic at one endpoint and Aurora handles the distribution — no application-side logic needed.

⚖️

How Reader Endpoint Works

Connection-level load balancing (not query-level)
Each new connection to the reader endpoint lands on a different reader (round-robin)
If a reader fails, the endpoint stops routing to it automatically
New readers added via Auto Scaling are picked up automatically
One endpoint to manage regardless of how many replicas you have

🎯

Custom Endpoints

Create a custom endpoint pointing to a specific subset of instances
Use case: analytics team uses large db.r6g.4xlarge readers; web tier uses small db.t3
Prevents analytics queries from consuming web app reader capacity
Multiple custom endpoints per cluster allowed
Reader endpoint + custom endpoints can coexist

Aurora Auto Scaling — Automatic Replica Management Advanced

Aurora Aurora Auto Scaling automatically adds or removes reader instances based on a CloudWatch metric — typically CPU utilization or connections per instance. You define minimum and maximum replica counts and a target metric value. Aurora scales replicas up during traffic spikes and removes them during quiet periods.

Aurora Auto Scaling — readers scale out on load, scale in during quiet periods

Aurora Auto Scaling

CloudWatch metric
CPU or connections
Min/max replica count

→

SCALE OUT (high load)

Writer

Reader 1

Reader 2

Reader 3+

Auto-added

⇄

SCALE IN (quiet)

Writer

Reader 1

Scale-out: new reader instance available in ~3–5 min • Scale-in: cooldown period prevents thrashing • Reader endpoint auto-includes new instances

Aurora vs RDS Read Replica Comparison Core

   Feature RDS Read Replicas Aurora Read Replicas 
  Max replicas 5 15 
 Storage per replica Full copy of DB Shared — no extra storage 
 Replication lag Milliseconds – seconds (binlog) <100 ms (storage-level) 
 Add replica time Minutes to hours (data copy) Minutes (compute only) 
 Auto Scaling ❌ Manual only ✅ Aurora Auto Scaling 
 Failover promotion Manual Automatic (<30s) 
 Reader endpoint ❌ Manual per-replica ✅ Single load-balanced endpoint 
  

Feature	RDS Read Replicas	Aurora Read Replicas
Max replicas	5	15
Storage per replica	Full copy of DB	Shared — no extra storage
Replication lag	Milliseconds – seconds (binlog)	<100 ms (storage-level)
Add replica time	Minutes to hours (data copy)	Minutes (compute only)
Auto Scaling	❌ Manual only	✅ Aurora Auto Scaling
Failover promotion	Manual	Automatic (<30s)
Reader endpoint	❌ Manual per-replica	✅ Single load-balanced endpoint

🧠 Key Insight

Aurora scales reads by adding compute, not storage. Up to 15 replicas, each sharing the cluster volume at near-zero extra cost. The reader endpoint abstracts the entire replica fleet behind one DNS address. Auto Scaling adds and removes replicas automatically without human intervention.

Aurora Parallel Query — Push Queries to Storage Nodes Advanced

Aurora Parallel Query pushes the computation of scans, joins, and aggregations down to the Aurora storage layer, running in parallel across thousands of storage nodes. Instead of pulling all data up to the compute instance to process it, the storage layer does the work where the data lives. This dramatically reduces the data transferred to the compute instance and speeds up analytical queries significantly.

⚡

How It Works

Full table scans, JOINs, GROUP BY, aggregates pushed to storage nodes
Up to thousands of parallel threads across storage layer
Only the final result set returned to compute instance
Transparent — same SQL, no schema changes needed
Supported: Aurora MySQL 8.0

🎯

When to Use

Analytics queries on large tables (>1 GB)
ELT transformations within the database
Reporting queries with COUNT(), SUM(), GROUP BY
Not for: short OLTP queries with LIMIT 10 — overhead not worth it
Exam: “run analytics on Aurora without extra infrastructure” → Parallel Query

Chapter Summary Introductory

 15 read replicas (vs 5 for RDS) — all share the cluster volume, no extra storage cost
Replication lag <100 ms — storage-level, much lower than RDS binlog async replication
Reader endpoint: single DNS, connection-level load-balanced across all readers
Custom endpoints: route specific workloads (analytics) to specific instances
Aurora Auto Scaling: adds/removes readers based on CPU / connections — fully automatic
Parallel Query: pushes scans/aggregates to storage layer — analytics speedup without extra infra
Exam: “need more than 5 read replicas” → Aurora; “read replica auto-scaling” → Aurora Auto Scaling
 

══════════════════════════════════════════ -->

Chapter Five

Aurora Serverless v2

What is Aurora Serverless v2 Introductory

Aurora Serverless v2 is a configuration for Aurora DB instances where compute capacity scales automatically based on actual workload demand — in fractions of a second. Instead of choosing a fixed instance class (db.r6g.large), you define a minimum and maximum ACU range. Aurora scales within that range continuously without any downtime.

👉 Key distinction: Serverless v2 is NOT a separate product — it is a capacity type for an Aurora DB instance. The same Aurora cluster can mix provisioned instances (fixed size) and Serverless v2 instances. The storage layer is the same shared cluster volume either way.

ACU — Aurora Capacity Unit Core

📏

What is an ACU

1 ACU ≈ 2 GiB RAM + proportional CPU + network
Minimum: 0.5 ACU
Maximum: 128 ACU
Scales in increments as small as 0.5 ACU
You set min and max — Aurora manages the rest

⚡

Scaling Speed

Scales up in fractions of a second
No cold start (unlike Serverless v1)
Scales down gradually to avoid thrashing
Responds to CPU, connections, and memory pressure
Transparent to the application — no connection drop

💳

Cost Model

Pay per ACU-second consumed
No charge for idle (below min ACU)
Min ACU is always running (warmth)
Better for variable workloads vs paying for peak 24/7
Storage billed same way as provisioned Aurora

Concept Diagram — ACU Scaling Profile Core

Serverless v2 — compute scales instantly within your ACU range as traffic rises and falls

Serverless v2 vs Provisioned — Comparison Core

   Feature Provisioned Aurora Serverless v2 
  Compute sizing Fixed instance class (db.r6g.4xlarge) ACU range (min 0.5 – max 128) 
 Scaling Manual resize (brief downtime) Automatic, zero downtime, subsecond 
 Cost model Per hour for instance size (always-on) Per ACU-second consumed 
 Best for Predictable, steady traffic Variable, spiky, or unpredictable traffic 
 Cold start N/A — always running None (min ACU keeps instance warm) 
 Mixed cluster — ✅ Can mix with provisioned in same cluster 
  

Feature	Provisioned Aurora	Serverless v2
Compute sizing	Fixed instance class (db.r6g.4xlarge)	ACU range (min 0.5 – max 128)
Scaling	Manual resize (brief downtime)	Automatic, zero downtime, subsecond
Cost model	Per hour for instance size (always-on)	Per ACU-second consumed
Best for	Predictable, steady traffic	Variable, spiky, or unpredictable traffic
Cold start	N/A — always running	None (min ACU keeps instance warm)
Mixed cluster	—	✅ Can mix with provisioned in same cluster

AWS Architecture Diagram — Lambda + Serverless v2 Advanced

Serverless v2 is ideal for Lambda-based architectures where traffic is bursty and unpredictable. Pairing with RDS Proxy gives you connection pooling on top of auto-scaling compute — neither Lambda connection exhaustion nor wasted idle compute.

Serverless v2 + Lambda — compute scales with traffic, RDS Proxy handles connection pooling

Lambda ×N

Bursty invocations
Variable concurrency

→

RDS Proxy

Connection pooling
IAM auth
Reduces connections

→

Aurora Serverless v2

0.5 – 128 ACU
Scales with Lambda
No idle waste

💾 Shared Aurora cluster volume • Storage billed per GB • Compute billed per ACU-second • Zero provisioning decisions

When to Use Serverless v2 Core

✅

Best Use Cases

Variable / spiky workloads — e-commerce, news spikes, event-driven apps
Dev / test environments — scales to near-zero at night
Multi-tenant SaaS — each tenant's DB right-sizes itself
Lambda / API Gateway backends — matches serverless compute pattern
New apps — unknown traffic profile, no over-provisioning
Mixed clusters: Serverless v2 readers + provisioned writer

❌

When Provisioned is Better

Steady, predictable traffic (provisioned is cheaper at constant load)
Workloads needing specific instance family guarantees
Need for the very highest consistent performance (db.r6g.16xlarge)
Cost predictability required (Serverless v2 can spike with traffic)

Serverless v1 vs Serverless v2 — Know the Difference Advanced

   Feature Serverless v1 (Legacy) Serverless v2 (Current) 
  Scales to zero ✅ Yes (DB pauses when idle) ❌ No (min 0.5 ACU stays warm) 
 Cold start 25–30 seconds None — always responsive 
 Scaling speed Minutes Fractions of a second 
 Engine support Limited Full Aurora MySQL 8, PostgreSQL 13+ 
 Mixed cluster ❌ Not supported ✅ Mix with provisioned instances 
 Production-ready ❌ Not recommended ✅ Yes — recommended choice 
  

Feature	Serverless v1 (Legacy)	Serverless v2 (Current)
Scales to zero	✅ Yes (DB pauses when idle)	❌ No (min 0.5 ACU stays warm)
Cold start	25–30 seconds	None — always responsive
Scaling speed	Minutes	Fractions of a second
Engine support	Limited	Full Aurora MySQL 8, PostgreSQL 13+
Mixed cluster	❌ Not supported	✅ Mix with provisioned instances
Production-ready	❌ Not recommended	✅ Yes — recommended choice

RDS Proxy + Serverless v2 — The Complete Pattern Advanced

🧩

Why Combine Them

RDS Proxy keeps its connection pool to Aurora always warm
When Serverless v2 is at min ACU (0.5), Proxy holds its connections open
Sudden Lambda burst → Proxy absorbs the connection spike without forcing Serverless v2 to scale up prematurely
Serverless v2 scales slowly down after peak — Proxy prevents connection disruption during scale-down

📊

What Each Solves

RDS Proxy: connection exhaustion from Lambda — pools & reuses connections
Serverless v2: compute waste — scales ACU to match actual workload
Together: neither the DB nor the connection layer is over-provisioned
Exam: “Lambda + Aurora, minimize cost + connections” → RDS Proxy + Serverless v2

🧠 Key Insight

Serverless v2 solves the provisioning problem: you stop guessing at peak capacity and instead let Aurora scale compute instantly within your defined range. No cold starts, no downtime during scaling, fractions-of-a-second response. Pair with RDS Proxy for Lambda workloads to get both connection efficiency and elastic compute.

Chapter Summary Introductory

 Serverless v2 = Aurora DB instance capacity type; compute auto-scales within ACU min/max range
ACU: 1 ACU ≈ 2 GiB RAM; range 0.5–128; scales in fractions of a second
No cold start (unlike v1): min ACU keeps instance warm; scales up instantly on demand
Pay per ACU-second — cheaper than provisioned for variable/spiky workloads
Same cluster volume: mix Serverless v2 and provisioned instances in one cluster
Exam: “variable / unpredictable DB workload” or “serverless compute + DB” → Aurora Serverless v2
 

Chapter Six

Security & Backups

Network Security — VPC, Subnets, Security Groups Introductory

Aurora always runs inside a VPC. A DB subnet group spanning at least 2 AZs is required — Aurora uses all three AZs for its storage regardless of where the compute instances sit. Best practice: place compute in private subnets, with Security Groups allowing only your app tier to reach the Aurora port.

Aurora security layers — VPC + private subnets + Security Group + KMS + SSL

VPC

AZ-a — Private

Writer

KMS encrypted
Port 3306/5432

AZ-b — Private

Reader 1

KMS encrypted
Read-only

AZ-c — Private

Reader 2

KMS encrypted
Read-only

🔒 Security Group: inbound port 3306 (MySQL) / 5432 (PG) from App SG only — never 0.0.0.0/0
🔑 KMS encryption at rest — cluster volume, snapshots, backups all encrypted
🌐 SSL/TLS in transit — download Aurora CA bundle, enforce in connection string

Encryption at Rest Core

🔑

KMS Encryption

Must enable at cluster creation time — cannot add later
Uses AWS KMS (AES-256)
Encrypts: cluster volume, automated backups, snapshots, read replicas
Shared storage = one KMS key encrypts everything
Read replicas inherit encryption from the cluster — no separate key needed
To encrypt unencrypted cluster: snapshot → copy with encryption → restore

🌐

TLS in Transit

Download Aurora CA certificate bundle from AWS
MySQL: --ssl-ca=AmazonRootCA1.pem
PostgreSQL: sslmode=verify-full
Enforce server-side: set require_secure_transport = ON (MySQL) or ssl = on (PG)
Encrypts all data between app and Aurora endpoint

IAM Authentication & Secrets Manager Core

👤

IAM DB Authentication

Authenticate using an IAM token instead of a password
Token generated via generate-db-auth-token API, valid 15 minutes
Supported: Aurora MySQL 5.7/8.0 and Aurora PostgreSQL 10+
No credentials stored in application code
Attach IAM role to EC2 / Lambda — they get DB access automatically
Exam: “no passwords in code, EC2 to Aurora” → IAM DB auth

🗝️

Secrets Manager (Recommended)

Store Aurora master password in Secrets Manager
Native Aurora integration — automatic rotation without downtime
Rotation schedule: 30 / 60 / 90 days or custom
App reads secret at runtime; never hardcoded
Works for all engines (MySQL, PG) — unlike IAM auth
Exam: “rotate DB credentials automatically” → Secrets Manager

Aurora Backups — Continuous by Design Core

Aurora backups work differently from RDS. Because the storage layer continuously logs all changes to S3 in the background, Aurora does not have a traditional backup window. The backup process never interrupts the cluster and causes zero performance impact — on any instance, in any configuration.

🔄

Automated Backups (Always On)

Continuous backup to S3 — cannot be disabled
Retention: 1–35 days (default 1 day, set to at least 7)
Enables Point-in-Time Recovery to any second within retention
No backup window — zero performance impact always
Stored in S3 (AWS-managed, not visible in your S3 console)
Backup data spans all AZs — regionally durable

📸

Manual Snapshots

User-initiated at any time
Retained indefinitely until you delete them
Stored in S3 — visible in the Aurora console
Survive cluster deletion
Copy across regions for cross-region DR
Share with other AWS accounts

Aurora Backtrack — Rewind Without Restoring Advanced

Aurora Backtrack is an Aurora-exclusive feature that lets you rewind your running database to a previous point in time in place — without creating a new cluster. Instead of restoring a snapshot (which creates a new endpoint), Backtrack reverses the cluster volume itself within seconds. This is powerful for accidental schema drops or data corruption.

⏪

How Backtrack Works

Pre-define a backtrack window (up to 72 hours)
Aurora retains change records for that window
To backtrack: specify a target timestamp
Cluster pauses, reverses changes — back online in seconds
Same cluster, same endpoint — no DNS change
Supported: Aurora MySQL only (not PostgreSQL)

🔧

Backtrack vs PITR

Backtrack: rewinds the existing cluster in-place — same endpoint, seconds
PITR: creates a new cluster from backup — new endpoint, minutes
Use Backtrack for: accidental DROP TABLE, recent data corruption
Use PITR for: longer range recovery, keeping original cluster intact
⚠️ Overhead: Backtrack change records consume additional storage; plan capacity for 72h window
Exam: “rewind Aurora quickly without new cluster” → Backtrack

Aurora vs RDS — Backup Differences Core

   Feature RDS Aurora 
  Backup method Daily snapshot + transaction logs Continuous log streaming to S3 
 Backup window Required (brief I/O pause single-AZ) None — continuous, zero impact 
 Can disable backups Yes (set retention = 0) No — always on 
 Backtrack ❌ Not supported ✅ Aurora MySQL (up to 72h) 
 Restore creates New DB instance (new endpoint) New cluster (new endpoint) or Backtrack in-place 
 Performance impact Brief I/O pause (single-AZ) Zero — storage-level continuous 
  

Feature	RDS	Aurora
Backup method	Daily snapshot + transaction logs	Continuous log streaming to S3
Backup window	Required (brief I/O pause single-AZ)	None — continuous, zero impact
Can disable backups	Yes (set retention = 0)	No — always on
Backtrack	❌ Not supported	✅ Aurora MySQL (up to 72h)
Restore creates	New DB instance (new endpoint)	New cluster (new endpoint) or Backtrack in-place
Performance impact	Brief I/O pause (single-AZ)	Zero — storage-level continuous

🧠 Key Insight

Aurora never has a backup window and you cannot disable backups — continuous backup to S3 is architectural. Backtrack is Aurora's unique power move: rewind the live cluster in seconds rather than restoring a new one. Encryption must be set at creation — and because storage is shared, one KMS key covers the entire cluster including all replicas and snapshots.

Chapter Summary Introductory

 Private subnet + Security Group: Aurora never accessible from internet; SG allows app SG only
KMS at rest: must enable at creation; one key covers cluster volume, snapshots, replicas
IAM auth: MySQL/PG only; token-based, no passwords; Secrets Manager works for all engines
Continuous backup: always on, no backup window, zero performance impact — cannot be disabled
Backtrack: Aurora MySQL only — rewind live cluster in-place (up to 72h) without new endpoint
Exam: “rewind Aurora without new cluster” → Backtrack; “Aurora backup window” → none needed
 

Chapter Seven

Architecture Patterns

Pattern 1 — High-Performance Web Application Core

Pattern 1 — Classic 3-tier: ALB + EC2 Auto Scaling + Aurora Multi-Reader Cluster

VPC

ALB

Public subnet
HTTPS

→

App Tier (Private)

EC2

AZ-a

EC2

AZ-b

→

Aurora Cluster (Private)

Writer

AZ-a•R/W

Reader 1

AZ-b

Reader 2

AZ-c

Writes → cluster endpoint • Reads → reader endpoint • Secrets Manager for credentials • KMS at rest

Pattern 2 — Read-Heavy App with Caching Layer Core

Pattern 2 — Cache-aside: ElastiCache absorbs hot reads, Aurora handles cache misses + all writes

App Server

Check cache first

→

ElastiCache

Redis • hot reads
Sub-ms latency

cache miss ↓

Reader EP

Aurora readers
load-balanced

writes ↓

Writer

Aurora writer
cluster EP

ElastiCache serves hot data • Aurora readers handle warm reads • Aurora writer handles all mutations • Aurora Auto Scaling adds readers under load

Pattern 3 — Serverless API (Lambda + RDS Proxy + Aurora Serverless v2) Advanced

Pattern 3 — Fully serverless: API GW + Lambda + RDS Proxy + Aurora Serverless v2

API Gateway

REST / HTTP API
Auth + routing

→

Lambda

Business logic
VPC or non-VPC

→

RDS Proxy

Connection pool
IAM auth
Failover buffer

→

Aurora SLv2

0.5–128 ACU
Scales with Lambda
No idle waste

Lambda bursty invocations • RDS Proxy prevents connection exhaustion • Serverless v2 compute matches traffic • Pay only for ACU-seconds used

Pattern 4 — Aurora Global Database (Multi-Region) Advanced

Aurora Global Database replicates an entire Aurora cluster to up to 5 secondary read-only regions using dedicated replication infrastructure at the storage layer — not over the public internet. Replication lag is typically under 1 second. In a disaster, any secondary can be promoted to a full primary in under 1 minute.

🌐

Replication

Storage-level replication — not binlog
<1 second typical lag across regions
Dedicated AWS replication network (not internet)
Up to 5 secondary regions
Each secondary has its own cluster of readers

🛡️

Disaster Recovery

RPO: <1 second (storage-level, near-zero data loss)
RTO: <1 minute (promote secondary to primary)
Promotion is a manual or automated operation
Promoted secondary becomes a full independent primary
Best cross-region DR for relational databases on AWS

🗺️

Global Read Latency

App in eu-west-1 reads from secondary in eu-west-1
No transcontinental round-trip for reads
Secondaries are read-only (writes must go to primary)
Global write endpoint available — routes to primary
Use case: global SaaS, multinational compliance

Pattern 4 — Aurora Global Database: primary in us-east-1, replicated to eu-west-1 + ap-southeast-1

PRIMARY (us-east-1)

Aurora Primary

Writer + 2 readers
All writes here

<1s lag ⇒

storage replication

SECONDARY (eu-west-1)

Aurora Secondary

Read-only readers
EU user reads

<1s lag ⇒

storage replication

SECONDARY (ap-southeast-1)

Aurora Secondary

Read-only readers
APAC user reads

🆘 Primary region failure → promote eu-west-1 to primary in <1 min • RPO <1s • RTO <1 min

Global Database — Failover Types & Upgrade Order Advanced

🚨

Failover Options

Managed failover: promote a secondary to primary via console/CLI — Aurora coordinates the transition, replication paused, ~1 min
Detach & promote: detach secondary region from the global cluster, promote it to an independent standalone cluster — manual, for edge-case DR
Original primary becomes a secondary cluster after managed failover (it rejoins)
Exam: “fastest cross-region DR” → Aurora Global Database managed failover

🔄

Engine Version Upgrade Order

Primary region upgrades first
Secondaries upgraded after — they lag behind until upgrade is applied
Plan global maintenance windows: upgrade primary during primary low-traffic, then secondary
During secondary upgrade: reads from that region temporarily unavailable
Cannot upgrade a secondary before the primary

Pattern 5 — Aurora Machine Learning (SQL → SageMaker) Advanced

Aurora Machine Learning lets you call Amazon SageMaker and Amazon Comprehend endpoints directly from SQL statements. The ML inference happens without moving data out of Aurora — the storage layer coordinates with SageMaker in the background and returns the prediction as a query result.

🧠

How Aurora ML Works

Create a ML function that maps to a SageMaker endpoint or Comprehend API
Call it from SQL: SELECT aurora_ml_predict(col1, col2) FROM table
Aurora batches rows, calls SageMaker, merges results back into your result set
Supports: SageMaker (custom models) and Amazon Comprehend (sentiment, language, NER)
Supported: Aurora MySQL 8.0 and Aurora PostgreSQL 13+

📊

Use Cases

Real-time product recommendations from SQL query
Sentiment scoring customer reviews inline with application queries
Fraud detection integrated into transaction processing
No data pipeline, no extra infrastructure, no data movement
Exam: “call ML model from SQL in Aurora” → Aurora Machine Learning

   You Need... Use Why 
  MySQL / PG — highest perf + HA Aurora Shared storage, 15 replicas, <30s failover 
 MySQL / PG — standard managed RDS Simpler, cheaper, sufficient for most workloads 
 Oracle / SQL Server RDS (not Aurora) Aurora doesn't support these engines 
 Variable workload, serverless compute Aurora Serverless v2 ACU auto-scales, zero cold start, pay per use 
 Global reads + sub-1s cross-region DR Aurora Global Database RPO <1s, RTO <1 min, 5 secondary regions 
 Key-value / document / serverless scale DynamoDB No SQL, unlimited scale, single-digit ms 
 Sub-ms caching, reduce DB load ElastiCache Redis / Memcached in front of Aurora 
  

You Need...	Use	Why
MySQL / PG — highest perf + HA	Aurora	Shared storage, 15 replicas, <30s failover
MySQL / PG — standard managed	RDS	Simpler, cheaper, sufficient for most workloads
Oracle / SQL Server	RDS (not Aurora)	Aurora doesn't support these engines
Variable workload, serverless compute	Aurora Serverless v2	ACU auto-scales, zero cold start, pay per use
Global reads + sub-1s cross-region DR	Aurora Global Database	RPO <1s, RTO <1 min, 5 secondary regions
Key-value / document / serverless scale	DynamoDB	No SQL, unlimited scale, single-digit ms
Sub-ms caching, reduce DB load	ElastiCache	Redis / Memcached in front of Aurora

Exam Cheatsheet Core

🎯 Exam Keywords → Aurora Answer

“Aurora ≠ fast RDS” → decoupled shared storage architecture
“more than 5 read replicas” → Aurora (up to 15)
“replica lag <100ms” → Aurora storage-level replication
“failover <30 seconds” → Aurora (shared storage = promoted reader already has data)
“auto-scale read replicas” → Aurora Auto Scaling
“analytics queries on Aurora, no extra infra” → Aurora Parallel Query
“variable / spiky DB workload” → Aurora Serverless v2
“serverless compute + relational DB” → Lambda + RDS Proxy + Aurora Serverless v2
“Lambda + Aurora, minimize cost + connections” → RDS Proxy + Serverless v2
“rewind Aurora without new cluster” → Aurora Backtrack (MySQL only, up to 72h)
“cross-region DR RPO <1s, RTO <1min” → Aurora Global Database
“global low-latency reads + single write region” → Aurora Global Database
“fastest cross-region failover” → Aurora Global Database managed failover
“call ML model from SQL” → Aurora Machine Learning (SageMaker / Comprehend)
“high I/O workload, predictable DB cost” → Aurora I/O-Optimized
“Aurora backup window” → there is none; continuous backup, zero impact
“Oracle / SQL Server on Aurora” → NOT possible; use RDS for those engines
“6 copies, 3 AZs” → Aurora storage always; not configurable
“storage auto-scales” → Aurora cluster volume, 10 GB → 128 TiB, no downtime

🧠 Final Insight

Aurora is a cloud-native redesign — shared distributed storage is the foundation that enables everything else: instant replicas, sub-30s failover, continuous no-impact backups, Backtrack, and Global Database. Pick Aurora when RDS hits its limits: more replicas needed, faster failover required, global distribution demanded, or compute needs to auto-scale with Serverless v2.