Amazon Aurora
LearningTree · AWS · Database

Amazon Aurora —
Cloud-Native Relational Database

Aurora is not just a faster RDS — it is a ground-up redesign of the relational database for the cloud. Decoupled storage and compute, six-way replication across three AZs, MySQL and PostgreSQL compatible, and up to 5× MySQL performance. Aurora is what you choose when RDS is not enough.

⚡ Aurora in 30 Seconds

  • Cloud-native relational DB — MySQL & PostgreSQL compatible
  • Shared distributed storage — decoupled from compute, auto-scales to 128 TiB
  • 6 copies of data across 3 AZs — HA is built-in, not an add-on
  • Up to 15 read replicas — all share the same underlying storage
  • 5× MySQL and 3× PostgreSQL performance vs community editions
  • Aurora Serverless v2 — compute scales instantly from 0.5 to 128 ACUs
01
Chapter One

What is Amazon Aurora

The Problem with Traditional Cloud Databases Introductory

When AWS launched RDS, they took existing database engines (MySQL, PostgreSQL, Oracle) and ran them on managed EC2 infrastructure. That solved ops burden — no more patching, backups are handled — but the database architecture itself was unchanged. It was still designed for single-server, spinning-disk-era assumptions.

👉 The root problem: Traditional databases couple storage to compute. Each instance owns its disk. Replication means copying data over a network, constantly. Failover means waiting for a standby to catch up. Storage limits come from the instance. Cloud demands something better.

What is Amazon Aurora Introductory

Amazon Aurora is a cloud-native relational database built from scratch at AWS, designed to resolve the architectural limitations of traditional databases. It is fully MySQL and PostgreSQL compatible — your SQL works, your drivers work, your ORMs work. But underneath, everything is different.

🧬

Cloud-Native Design

Built for distributed cloud storage from scratch. Storage is a distributed, fault-tolerant service — not a single disk attached to a server.

🔄

MySQL & PG Compatible

Aurora MySQL 3.x is compatible with MySQL 8. Aurora PostgreSQL 15.x is compatible with PostgreSQL 15. No SQL rewrites needed.

5× / 3× Performance

5× throughput vs MySQL community edition. 3× vs PostgreSQL. Achieved through distributed storage, parallel writes, and log-based replication.

Aurora vs RDS — The Single Most Important Distinction Core

Most people think Aurora is just “RDS but faster”. That is wrong and will cost you exam marks. The difference is architectural:

Aspect RDS (MySQL / PG) Aurora
Storage model Instance-local EBS volume Shared distributed storage tier
Storage limit 64 TiB max (manual scaling) 128 TiB auto-scales
Replication Async binlog (replica copies data) Storage-level (no data moves)
HA copies 1 standby (Multi-AZ) 6 copies across 3 AZs (always)
Failover time 60–120 seconds <30 seconds
Read replicas Up to 5 (async, data copied) Up to 15 (share same storage)
Replica lag Milliseconds to seconds Milliseconds (storage-level)
Concept Diagram — RDS vs Aurora Storage Model Core
The fundamental difference — RDS local storage vs Aurora shared distributed storage
RDS — COUPLED STORAGE + COMPUTE Primary Compute 💾 Local EBS Standby Compute 💾 Local EBS copy 🆘 Each node owns its own disk Replication = copying data across network ⚠️ failover 60–120s • replica = extra disk cost AURORA — DECOUPLED STORAGE Writer Compute only Reader ×N Compute only ✨ Shared Distributed Storage 6 copies • 3 AZs • auto-scales to 128 TiB All nodes read/write the same storage ✔ no data copying between nodes failover <30s • replicas share storage instantly
Mental Model — The Right Way to Think About Aurora Introductory

🧠 Most People Think (Wrong):

“Aurora = RDS with better hardware” or “Aurora = fast MySQL”

✨ Better Mental Model (Correct):

Aurora = Compute nodes + a cloud-native shared storage service. The database engines (writer + readers) are just compute that plugs into one shared storage fabric. The storage itself is distributed, replicated, and elastic — independent of any single instance.

🏠

Old Model (RDS)

  • DB engine + local disk = one tightly coupled unit
  • Replicate = copy data to another server's disk
  • Add replica = duplicate storage cost
  • Failover = wait for standby to assume its local disk
  • Like a house: each house has its own pipes
🏗️

New Model (Aurora)

  • Storage is a separate elastic service
  • Replicate = storage handles it, engines don't know
  • Add replica = new compute node, no extra storage
  • Failover = another compute node grabs the same storage
  • Like a city water system: each tap connects same pipes
💡

Why It Matters

  • Faster failover (no data sync needed)
  • Instant read replicas (no copy of data)
  • Storage grows automatically (no pre-provisioning)
  • Lower replication lag (storage-level, not app-level)
  • Aurora Global Database becomes feasible
Aurora Compatibility — Two Flavours Core
Aurora comes in two variants — pick based on which engine your app uses
Aurora MySQL
Aurora MySQL
Compatible: MySQL 5.7 / 8.0
Aurora MySQL 2.x / 3.x
5× MySQL community performance
Aurora PostgreSQL
Aurora PostgreSQL
Compatible: PG 13 / 14 / 15 / 16
Aurora PG 15.x
3× PostgreSQL community performance

⚠️ Aurora does NOT support Oracle or SQL Server — use RDS for those engines.

AWS Architecture Diagram — Aurora Cluster in a VPC Core
Aurora DB cluster inside VPC — writer + readers across AZs, shared storage underneath
VPC VPC (10.0.0.0/16)
PUBLIC SUBNET
EC2
EC2 App
Web server
PRIVATE SUBNETS — Aurora Cluster
Aurora Writer
Aurora Writer
AZ-a • R/W
Cluster endpoint
Reader 1
Reader 1
AZ-b • R only
Reader endpoint
Reader 2
Reader 2
AZ-c • R only
Reader endpoint
🔒 SG: App SG → Aurora SG
🔑 KMS encrypted at rest
💾 Shared storage (6 copies / 3 AZs)
🔄 Auto-scales storage to 128 TiB
Aurora vs RDS — Cost Reality Check Core
💰

Compute Cost (Higher for Aurora)

  • db.r6g.large (2 vCPU, 16 GB RAM)
  • RDS MySQL: ~$0.18/hr
  • Aurora MySQL: ~$0.22/hr (~22% more)
  • Difference compounds at scale with many instances
  • Aurora is worth it when architecture features justify the delta
⚖️

Storage Cost (Cheaper at Scale)

  • Aurora: shared storage — pay once, used by all replicas
  • RDS: each read replica = full data copy = extra storage cost
  • 15 Aurora replicas: 1× storage cost
  • 15 RDS replicas: 16× storage cost
  • At 5+ replicas, Aurora total cost is often lower than RDS
When to Choose Aurora over RDS Core

Choose Aurora When

  • Need higher throughput than RDS MySQL / PG
  • Need more than 5 read replicas (Aurora supports 15)
  • Need failover under 30 seconds
  • Need global distribution (Aurora Global Database)
  • Need serverless variable workloads (Serverless v2)
  • Storage >10 TiB or unpredictable growth
  • Production OLTP requiring highest availability
📌

Stick with RDS When

  • Need Oracle or SQL Server (Aurora doesn't support them)
  • Budget is tight (Aurora ~20% higher per instance)
  • Workload is light — RDS already meets SLAs
  • Need RDS Custom OS-level access
  • Legacy app tied specifically to engine minor version
  • MariaDB (not supported by Aurora)
🧠 Key Insight

Aurora is not RDS with faster hardware — it is a different storage architecture. The shared distributed storage is what enables everything: instant replicas, sub-30s failover, auto-scaling storage, and Global Database. Understand that one idea and the rest of Aurora snaps into place.

Chapter Summary Introductory
  • Aurora = cloud-native relational DB — MySQL and PostgreSQL compatible; no Oracle/SQL Server
  • Decoupled storage: compute and storage are separate — the fundamental design difference from RDS
  • 6 copies / 3 AZs: HA is architectural, not optional — no separate Multi-AZ to configure
  • 5× MySQL / 3× PostgreSQL performance vs community editions
  • Aurora ≠ fast RDS: the storage architecture is completely different; that is the key exam insight
  • Mental model: writer + readers are compute nodes plugging into one shared storage system
02
Chapter Two

Aurora Architecture — Shared Storage & Compute Separation

The Aurora Cluster — Two Layers Introductory

Every Aurora deployment is called a DB cluster. A cluster has two completely independent layers: the compute layer (DB instances that run the MySQL or PostgreSQL engine) and the storage layer (the shared distributed volume that all compute instances read and write). These two layers scale independently of each other.

🖥️

Compute Layer (DB Instances)

  • Writer instance — one per cluster, handles all writes
  • Reader instances — up to 15, read-only, same data
  • Each instance is a specific instance class (db.r6g, db.t3, etc.)
  • Instances can be added or removed without touching storage
  • Serverless v2 = compute that auto-scales instead of fixed instances
💾

Storage Layer (Cluster Volume)

  • Single logical volume shared by all compute instances
  • Physically: 6 copies spread across 3 AZs (2 per AZ)
  • Stored in 10 GB segments (“protection groups”)
  • Auto-scales from 10 GB to 128 TiB with zero downtime
  • You pay only for storage actually used (not pre-provisioned)
How the Storage Volume Works — Segments & Self-Healing Core

The Aurora cluster volume is divided into 10 GB segments called protection groups. Each protection group is replicated six times across three AZs. This granularity means that if a disk fails, only the corresponding segments need to be repaired — not the entire database. Aurora performs this repair continuously in the background, peer-to-peer between storage nodes, without involving the compute layer at all.

👉 Why this matters for HA: Traditional databases repair by replacing the failed node and copying all data back. Aurora repairs individual 10 GB segments, in parallel, across many storage nodes simultaneously. A 1 TB database can repair a failed copy in minutes, not hours.

Concept Diagram — Aurora Cluster Volume Internals Core
Aurora cluster volume — 10 GB segments replicated 6× across 3 AZs
COMPUTE LAYER — DB INSTANCES (can add/remove without touching storage) Writer (AZ-a) Reader 1 (AZ-b) Reader 2 (AZ-c) + up to 12 more readers all R/W STORAGE LAYER — CLUSTER VOLUME (shared, auto-scales to 128 TiB) AZ-a (2 copies) seg 1A seg 1B seg 2A seg 2B 10 GB each • continuous repair AZ-b (2 copies) seg 1C seg 1D seg 2C seg 2D 10 GB each • continuous repair AZ-c (2 copies) seg 1E seg 1F seg 2E seg 2F 10 GB each • continuous repair
Write Path — How Aurora Commits a Write Core

Aurora uses a quorum-based write model. When the writer commits a transaction, it does not write data pages to storage — it writes only redo log records to the 6 storage nodes. The write is acknowledged when 4 of 6 nodes confirm receipt. Storage nodes reconstruct data pages from log records locally. This is why Aurora writes are so fast — less data moves over the network.

✏️

Write Quorum: 4/6

  • Writer sends redo log to all 6 nodes
  • Waits for 4 acknowledgements
  • Commit confirmed — client gets response
  • Remaining 2 catch up asynchronously
  • Can tolerate 2 failed storage nodes without halting writes
📖

Read Quorum: 3/6

  • Reads need 3 of 6 nodes to agree
  • Can tolerate 3 failed nodes and still serve reads
  • Readers also use log records to materialize pages locally
  • Dramatically reduces replication lag vs binlog
📤

Only Logs, No Pages

  • Writer sends log records (small), not full data pages (large)
  • Network I/O reduced by up to 7× vs traditional replication
  • Storage nodes apply logs locally — no round-trips for page writes
  • This is the core reason for Aurora's write performance advantage
Cluster Endpoints — How Applications Connect Core

Aurora exposes several DNS endpoints. Knowing which to use for which workload is critical for exam questions:

Endpoint Type Points To Use For
Cluster endpoint Current writer instance All writes — always points to writer after failover
Reader endpoint All reader instances (load-balanced) All reads — distributes across readers automatically
Custom endpoint A specific subset of instances Analytics workloads on specific high-memory instances
Instance endpoint One specific instance Diagnostics, direct maintenance — not for production app
AWS Architecture Diagram — Endpoints in Practice Core
Aurora cluster endpoints — app uses cluster endpoint for writes, reader endpoint for reads
VPC VPC
APP TIER
App
App Server
Writes → cluster EP
Reads → reader EP
→ writes
→ reads
CLUSTER ENDPOINT (writer)
Writer
Writer
AZ-a • R/W
Auto-updates on failover
READER ENDPOINT (load-balanced)
Reader 1
Reader 1
AZ-b
Reader 2
Reader 2
AZ-c
💾 All instances share the same cluster volume — no data duplication per reader
🔄 Cluster endpoint automatically flips to new writer after failover — no app change needed
Storage Auto-Scaling — No Pre-Provisioning Core
📈

How Storage Scales

  • Starts at 10 GB minimum
  • Grows in 10 GB increments automatically
  • Maximum: 128 TiB
  • No downtime, no instance restart
  • You pay per GB-month actually used — not for pre-provisioned capacity
💰

Storage Pricing Model

  • Pay for storage consumed (not allocated)
  • Storage never shrinks automatically (high-water mark model)
  • I/O requests billed separately in Aurora Standard
  • I/O requests free (included) in Aurora I/O-Optimized
Aurora Standard vs I/O-Optimized Core
Feature Aurora Standard Aurora I/O-Optimized
Storage rate Lower (~$0.10/GB-month) Higher (~$0.225/GB-month)
I/O billing Per million requests (~$0.20) Included — free
Best for Low-to-moderate I/O workloads High I/O workloads (>25% of bill is I/O)
Exam keyword Default “high I/O, predictable costs”

💡 Rule of thumb: if your I/O charges exceed 25% of your total Aurora bill, switch to I/O-Optimized and you will likely save money.

🧠 Key Insight

Aurora writes only redo log records to storage, not full data pages. The quorum model (4/6 for write, 3/6 for read) means the cluster continues operating even when multiple storage nodes fail. Storage grows automatically — you never pre-provision. Adding readers costs no extra storage.

Chapter Summary Introductory
  • DB cluster = compute layer (writer + readers) + storage layer (cluster volume) — independent layers
  • Cluster volume = 6 copies across 3 AZs, 10 GB segments, self-healing, auto-scales to 128 TiB
  • Writes: redo logs sent to 6 storage nodes, commits on 4/6 ack — no full page writes
  • Reads: quorum 3/6 — readers materialize pages locally from logs, near-zero lag
  • Endpoints: cluster (writer), reader (load-balanced), custom (subset), instance (direct)
  • Storage cost: pay per GB used, not allocated; I/O-Optimized tier for high-I/O workloads
03
Chapter Three

High Availability & Replication

HA is Built Into Aurora — Not an Add-On Introductory

With RDS, you opt into high availability by enabling Multi-AZ, which provisions a separate standby instance. With Aurora, HA is the default state. The 6-copy storage model exists for every Aurora cluster regardless of whether you add readers or not. There is no “single-AZ Aurora” at the storage level.

👉 Critical exam point: You do NOT need to enable Multi-AZ in Aurora — the 6-copy storage replication across 3 AZs is always on. What you control is how many compute instances (readers) you add for faster compute-level failover.

The 6-Copy Replication Model Explained Core
🟢

AZ-a (2 copies)

  • 2 independent copies of every storage segment
  • Even if both fail — 4 copies remain alive
  • Typically hosts the writer instance
  • AZ outage loses 2 copies — writes continue (4/6 quorum met)
🔵

AZ-b (2 copies)

  • 2 independent copies
  • Hosts reader instances for spread reads
  • AZ failure here: 4 copies in AZ-a + AZ-c remain
  • Reads continue, writes continue
🟣

AZ-c (2 copies)

  • 2 independent copies
  • Full geographic separation from AZ-a and AZ-b
  • Highest-tier DR coverage: single-AZ outage never threatens writes
  • Reads continue from AZ-a + AZ-b readers
Quorum Failure Tolerance — How Much Can You Lose Core
Scenario Copies Lost Writes Reads
1 disk fails 1 of 6 ✔ Continue (4/6 quorum) ✔ Continue (3/6 quorum)
1 full AZ outage 2 of 6 ✔ Continue (4/6 quorum) ✔ Continue (3/6 quorum)
2 disks fail (diff AZs) 2 of 6 ✔ Continue ✔ Continue
3 disks fail 3 of 6 ❌ Halted (need 4) ✔ Continue (3/6 quorum)
4+ disks fail 4+ of 6 ❌ Halted ❌ Halted
Concept Diagram — 6-Copy 3-AZ Distribution Core
Aurora HA — 6 copies of every storage segment across 3 AZs (AZ outage ≠ data loss)
SEGMENT #1 (10 GB) — replicated 6× AZ-a Copy 1 storage node Copy 2 storage node ✔ lose both = 4 copies remain AZ-b Copy 3 storage node Copy 4 storage node ✔ lose both = 4 copies remain AZ-c Copy 5 storage node Copy 6 storage node ✔ lose both = 4 copies remain
Automatic Failover — Compute Level Core

When the writer instance fails, Aurora promotes one of the existing reader instances to become the new writer. Because storage is shared, the promoted reader already has the complete dataset — it just switches its mode. This is why Aurora failover is so much faster than RDS.

⏱️

Failover Timeline

  • Writer failure detected: ~10–20 seconds
  • Reader promoted to writer: immediate (no data copy)
  • DNS updated to point to new writer
  • Applications reconnect: total ~30 seconds
  • With Aurora readers present: typically <30 seconds
  • Without readers (single instance): ~60–120 seconds (new instance launched)
🏆

Failover Priority Tiers

  • Each reader has a priority tier: 0 (highest) – 15 (lowest)
  • Aurora promotes the reader at the highest priority tier
  • Tie in priority: promotes the largest instance first
  • Second tie: promotes by instance ID alphabetically
  • Set tier via console / CLI — use tier 0 for your primary DR reader
Failover Flow Diagram Core
Aurora failover — reader promoted instantly (same storage, no data copy needed)
AZ-a
Writer Failed
Writer ❌ FAILED
Hardware / host failure
AZ-b
Reader Promoted
Reader → Writer ✔
Priority tier 0
Promoted instantly
AZ-c
Reader Continues
Reader ✔ Continues
Still serving reads
No interruption
① Writer failure detected (~10–20s)
② Highest-priority reader selected for promotion
③ Reader promotes — already has full data in shared storage (no copy!)
④ Cluster DNS endpoint flips → new writer
⏱️ Total: <30 seconds with readers present
Aurora vs RDS — Failover Comparison Core

RDS Multi-AZ Failover

  • Standby is in a separate AZ with its own EBS
  • Data was synchronously replicated, but standby still needs to“take over” its volume
  • DNS updated, OS mounts change — process takes time
  • Failover: 60–120 seconds
  • Only 1 standby — one chance for failover

Aurora Failover

  • Reader already shares the cluster volume
  • Promotion = change compute role, no data handoff
  • Up to 15 readers — any can become writer
  • Priority tiers control which one is chosen
  • Failover: <30 seconds (with readers)
Aurora Multi-Master — Multiple Writers Advanced
✏️

What is Multi-Master

  • Up to 4 writer nodes in a single Aurora MySQL cluster
  • All nodes accept writes simultaneously
  • Conflict resolution handled at storage layer
  • No read replicas in multi-master mode
  • Single-master covers 99% of production use cases
📌

When to Consider It

  • Rarely needed — most HA/performance needs met by single-master + replicas
  • Use case: apps requiring write continuity during writer failover without a pause
  • Not a replacement for sharding or distributed databases
  • Supported: Aurora MySQL only (not PostgreSQL)
  • Exam: almost always refers to single-master; multi-master is niche
Self-Healing Storage — Continuous Repair Advanced
🔧

How Self-Healing Works

  • Storage nodes continuously monitor each other
  • When a node or disk detects data corruption or failure, peer nodes donate segments to repair it
  • Repair is parallel across many segment pairs simultaneously
  • 10 GB per segment = fast repair (not terabytes at once)
  • Completely transparent to compute (writer + readers)
📊

Impact on Availability

  • Aurora can lose 1 copy and be below quorum resilience threshold — repair begins immediately
  • Mean time to repair (MTTR): minutes for typical segments
  • Dramatically lowers dual-failure probability
  • No human intervention required
  • Aurora tracks unhealthy segments and prioritises their repair
🧠 Key Insight

Aurora HA is architectural, not operational. 6-copy quorum storage means a full AZ outage never loses data. Compute failover is fast (<30s) because promoted readers already share the storage. Self-healing continuously restores the 6-copy redundancy without you doing anything.

Chapter Summary Introductory
  • 6 copies / 3 AZs: always on — not optional; lose a full AZ and writes continue (4/6 quorum)
  • Write quorum: 4/6 — can lose 2 storage nodes and still write
  • Read quorum: 3/6 — can lose 3 storage nodes and still read
  • Compute failover: <30s — reader promotes because it already has the data
  • Priority tiers 0–15: control which reader becomes writer on failover
  • Self-healing storage: peer-to-peer segment repair, continuous, transparent, minutes MTTR
04
Chapter Four

Scaling & Read Replicas

Read Scaling — Up to 15 Replicas, Zero Storage Cost Introductory

Aurora supports up to 15 read replicas per cluster — three times more than RDS. Because all replicas share the same underlying cluster volume, adding a replica means provisioning new compute only. No data is copied. No extra storage cost per replica. The reader is live and serving traffic within minutes.

👉 Key exam insight: Aurora read replicas share the same storage as the writer. Adding a 15th replica costs the same as adding the 1st — just the compute instance. With RDS, every read replica is an independent database that holds a full copy of the data, which means you pay for storage per replica.

🔢

Replica Limits

  • Up to 15 read replicas per Aurora cluster
  • All replicas share the same cluster volume
  • Each replica is in its own AZ (recommended) or same AZ
  • Each has its own instance endpoint
  • All served via the single reader endpoint (load-balanced)
📉

Replication Lag

  • Storage-level replication — not binlog
  • Typically <100 ms behind writer
  • Much lower than RDS async replication
  • Replicas get the same log records as the storage layer
  • Exam: Aurora replica lag ≈ milliseconds; RDS lag ≈ seconds
💰

Cost Advantage

  • No extra storage per replica (shared volume)
  • Pay only for the compute instance class
  • Can use smaller instance for read-only workloads
  • Scale down replicas during off-peak (or use Serverless v2)
  • RDS: each replica = full data copy = double/triple storage cost
Concept Diagram — Write vs Read Distribution Core
Aurora read scaling — writes to writer, reads spread across up to 15 readers via reader endpoint
Application Writes → cluster EP Reads → reader EP Writer Cluster endpoint AZ-a • R/W All writes land here Reader Endpoint Replica 1 AZ-b Replica 2 AZ-c Replica 3–15 any AZ ✨ Shared Cluster Volume — writer + all replicas read/write the same storage
Reader Endpoint — Automatic Load Balancing Core

The Aurora reader endpoint is a single DNS address that automatically distributes incoming connections across all available reader instances using connection-level load balancing. You point your read traffic at one endpoint and Aurora handles the distribution — no application-side logic needed.

⚖️

How Reader Endpoint Works

  • Connection-level load balancing (not query-level)
  • Each new connection to the reader endpoint lands on a different reader (round-robin)
  • If a reader fails, the endpoint stops routing to it automatically
  • New readers added via Auto Scaling are picked up automatically
  • One endpoint to manage regardless of how many replicas you have
🎯

Custom Endpoints

  • Create a custom endpoint pointing to a specific subset of instances
  • Use case: analytics team uses large db.r6g.4xlarge readers; web tier uses small db.t3
  • Prevents analytics queries from consuming web app reader capacity
  • Multiple custom endpoints per cluster allowed
  • Reader endpoint + custom endpoints can coexist
Aurora Auto Scaling — Automatic Replica Management Advanced

Aurora Aurora Auto Scaling automatically adds or removes reader instances based on a CloudWatch metric — typically CPU utilization or connections per instance. You define minimum and maximum replica counts and a target metric value. Aurora scales replicas up during traffic spikes and removes them during quiet periods.

Aurora Auto Scaling — readers scale out on load, scale in during quiet periods
Auto Scaling
Aurora Auto Scaling
CloudWatch metric
CPU or connections
Min/max replica count
SCALE OUT (high load)
Writer
Writer
R1
Reader 1
R2
Reader 2
R3
Reader 3+
Auto-added
SCALE IN (quiet)
Writer
Writer
R1
Reader 1
Scale-out: new reader instance available in ~3–5 min • Scale-in: cooldown period prevents thrashing • Reader endpoint auto-includes new instances
Aurora vs RDS Read Replica Comparison Core
Feature RDS Read Replicas Aurora Read Replicas
Max replicas 5 15
Storage per replica Full copy of DB Shared — no extra storage
Replication lag Milliseconds – seconds (binlog) <100 ms (storage-level)
Add replica time Minutes to hours (data copy) Minutes (compute only)
Auto Scaling ❌ Manual only ✅ Aurora Auto Scaling
Failover promotion Manual Automatic (<30s)
Reader endpoint ❌ Manual per-replica ✅ Single load-balanced endpoint
🧠 Key Insight

Aurora scales reads by adding compute, not storage. Up to 15 replicas, each sharing the cluster volume at near-zero extra cost. The reader endpoint abstracts the entire replica fleet behind one DNS address. Auto Scaling adds and removes replicas automatically without human intervention.

Aurora Parallel Query — Push Queries to Storage Nodes Advanced

Aurora Parallel Query pushes the computation of scans, joins, and aggregations down to the Aurora storage layer, running in parallel across thousands of storage nodes. Instead of pulling all data up to the compute instance to process it, the storage layer does the work where the data lives. This dramatically reduces the data transferred to the compute instance and speeds up analytical queries significantly.

How It Works

  • Full table scans, JOINs, GROUP BY, aggregates pushed to storage nodes
  • Up to thousands of parallel threads across storage layer
  • Only the final result set returned to compute instance
  • Transparent — same SQL, no schema changes needed
  • Supported: Aurora MySQL 8.0
🎯

When to Use

  • Analytics queries on large tables (>1 GB)
  • ELT transformations within the database
  • Reporting queries with COUNT(), SUM(), GROUP BY
  • Not for: short OLTP queries with LIMIT 10 — overhead not worth it
  • Exam: “run analytics on Aurora without extra infrastructure” → Parallel Query
Chapter Summary Introductory
  • 15 read replicas (vs 5 for RDS) — all share the cluster volume, no extra storage cost
  • Replication lag <100 ms — storage-level, much lower than RDS binlog async replication
  • Reader endpoint: single DNS, connection-level load-balanced across all readers
  • Custom endpoints: route specific workloads (analytics) to specific instances
  • Aurora Auto Scaling: adds/removes readers based on CPU / connections — fully automatic
  • Parallel Query: pushes scans/aggregates to storage layer — analytics speedup without extra infra
  • Exam: “need more than 5 read replicas” → Aurora; “read replica auto-scaling” → Aurora Auto Scaling
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• -->
05
Chapter Five

Aurora Serverless v2

What is Aurora Serverless v2 Introductory

Aurora Serverless v2 is a configuration for Aurora DB instances where compute capacity scales automatically based on actual workload demand — in fractions of a second. Instead of choosing a fixed instance class (db.r6g.large), you define a minimum and maximum ACU range. Aurora scales within that range continuously without any downtime.

👉 Key distinction: Serverless v2 is NOT a separate product — it is a capacity type for an Aurora DB instance. The same Aurora cluster can mix provisioned instances (fixed size) and Serverless v2 instances. The storage layer is the same shared cluster volume either way.

ACU — Aurora Capacity Unit Core
📏

What is an ACU

  • 1 ACU ≈ 2 GiB RAM + proportional CPU + network
  • Minimum: 0.5 ACU
  • Maximum: 128 ACU
  • Scales in increments as small as 0.5 ACU
  • You set min and max — Aurora manages the rest

Scaling Speed

  • Scales up in fractions of a second
  • No cold start (unlike Serverless v1)
  • Scales down gradually to avoid thrashing
  • Responds to CPU, connections, and memory pressure
  • Transparent to the application — no connection drop
💳

Cost Model

  • Pay per ACU-second consumed
  • No charge for idle (below min ACU)
  • Min ACU is always running (warmth)
  • Better for variable workloads vs paying for peak 24/7
  • Storage billed same way as provisioned Aurora
Concept Diagram — ACU Scaling Profile Core
Serverless v2 — compute scales instantly within your ACU range as traffic rises and falls
128 ACU 64 ACU 16 ACU 0.5 ACU 00:00 06:00 12:00 (peak) 18:00 24:00 peak → scales to ~120 ACU within seconds actual compute used min: 0.5 ACU max: 128 ACU
Serverless v2 vs Provisioned — Comparison Core
Feature Provisioned Aurora Serverless v2
Compute sizing Fixed instance class (db.r6g.4xlarge) ACU range (min 0.5 – max 128)
Scaling Manual resize (brief downtime) Automatic, zero downtime, subsecond
Cost model Per hour for instance size (always-on) Per ACU-second consumed
Best for Predictable, steady traffic Variable, spiky, or unpredictable traffic
Cold start N/A — always running None (min ACU keeps instance warm)
Mixed cluster ✅ Can mix with provisioned in same cluster
AWS Architecture Diagram — Lambda + Serverless v2 Advanced

Serverless v2 is ideal for Lambda-based architectures where traffic is bursty and unpredictable. Pairing with RDS Proxy gives you connection pooling on top of auto-scaling compute — neither Lambda connection exhaustion nor wasted idle compute.

Serverless v2 + Lambda — compute scales with traffic, RDS Proxy handles connection pooling
Lambda
Lambda ×N
Bursty invocations
Variable concurrency
RDS Proxy
RDS Proxy
Connection pooling
IAM auth
Reduces connections
Aurora Serverless v2
Aurora Serverless v2
0.5 – 128 ACU
Scales with Lambda
No idle waste
💾 Shared Aurora cluster volume • Storage billed per GB • Compute billed per ACU-second • Zero provisioning decisions
When to Use Serverless v2 Core

Best Use Cases

  • Variable / spiky workloads — e-commerce, news spikes, event-driven apps
  • Dev / test environments — scales to near-zero at night
  • Multi-tenant SaaS — each tenant's DB right-sizes itself
  • Lambda / API Gateway backends — matches serverless compute pattern
  • New apps — unknown traffic profile, no over-provisioning
  • Mixed clusters: Serverless v2 readers + provisioned writer

When Provisioned is Better

  • Steady, predictable traffic (provisioned is cheaper at constant load)
  • Workloads needing specific instance family guarantees
  • Need for the very highest consistent performance (db.r6g.16xlarge)
  • Cost predictability required (Serverless v2 can spike with traffic)
Serverless v1 vs Serverless v2 — Know the Difference Advanced
Feature Serverless v1 (Legacy) Serverless v2 (Current)
Scales to zero ✅ Yes (DB pauses when idle) ❌ No (min 0.5 ACU stays warm)
Cold start 25–30 seconds None — always responsive
Scaling speed Minutes Fractions of a second
Engine support Limited Full Aurora MySQL 8, PostgreSQL 13+
Mixed cluster ❌ Not supported ✅ Mix with provisioned instances
Production-ready ❌ Not recommended ✅ Yes — recommended choice
RDS Proxy + Serverless v2 — The Complete Pattern Advanced
🧩

Why Combine Them

  • RDS Proxy keeps its connection pool to Aurora always warm
  • When Serverless v2 is at min ACU (0.5), Proxy holds its connections open
  • Sudden Lambda burst → Proxy absorbs the connection spike without forcing Serverless v2 to scale up prematurely
  • Serverless v2 scales slowly down after peak — Proxy prevents connection disruption during scale-down
📊

What Each Solves

  • RDS Proxy: connection exhaustion from Lambda — pools & reuses connections
  • Serverless v2: compute waste — scales ACU to match actual workload
  • Together: neither the DB nor the connection layer is over-provisioned
  • Exam: “Lambda + Aurora, minimize cost + connections” → RDS Proxy + Serverless v2
🧠 Key Insight

Serverless v2 solves the provisioning problem: you stop guessing at peak capacity and instead let Aurora scale compute instantly within your defined range. No cold starts, no downtime during scaling, fractions-of-a-second response. Pair with RDS Proxy for Lambda workloads to get both connection efficiency and elastic compute.

Chapter Summary Introductory
  • Serverless v2 = Aurora DB instance capacity type; compute auto-scales within ACU min/max range
  • ACU: 1 ACU ≈ 2 GiB RAM; range 0.5–128; scales in fractions of a second
  • No cold start (unlike v1): min ACU keeps instance warm; scales up instantly on demand
  • Pay per ACU-second — cheaper than provisioned for variable/spiky workloads
  • Same cluster volume: mix Serverless v2 and provisioned instances in one cluster
  • Exam: “variable / unpredictable DB workload” or “serverless compute + DB” → Aurora Serverless v2
06
Chapter Six

Security & Backups

Network Security — VPC, Subnets, Security Groups Introductory

Aurora always runs inside a VPC. A DB subnet group spanning at least 2 AZs is required — Aurora uses all three AZs for its storage regardless of where the compute instances sit. Best practice: place compute in private subnets, with Security Groups allowing only your app tier to reach the Aurora port.

Aurora security layers — VPC + private subnets + Security Group + KMS + SSL
VPC VPC
AZ-a — Private
Writer
Writer
KMS encrypted
Port 3306/5432
AZ-b — Private
Reader 1
Reader 1
KMS encrypted
Read-only
AZ-c — Private
Reader 2
Reader 2
KMS encrypted
Read-only
🔒 Security Group: inbound port 3306 (MySQL) / 5432 (PG) from App SG only — never 0.0.0.0/0
🔑 KMS encryption at rest — cluster volume, snapshots, backups all encrypted
🌐 SSL/TLS in transit — download Aurora CA bundle, enforce in connection string
Encryption at Rest Core
🔑

KMS Encryption

  • Must enable at cluster creation time — cannot add later
  • Uses AWS KMS (AES-256)
  • Encrypts: cluster volume, automated backups, snapshots, read replicas
  • Shared storage = one KMS key encrypts everything
  • Read replicas inherit encryption from the cluster — no separate key needed
  • To encrypt unencrypted cluster: snapshot → copy with encryption → restore
🌐

TLS in Transit

  • Download Aurora CA certificate bundle from AWS
  • MySQL: --ssl-ca=AmazonRootCA1.pem
  • PostgreSQL: sslmode=verify-full
  • Enforce server-side: set require_secure_transport = ON (MySQL) or ssl = on (PG)
  • Encrypts all data between app and Aurora endpoint
IAM Authentication & Secrets Manager Core
👤

IAM DB Authentication

  • Authenticate using an IAM token instead of a password
  • Token generated via generate-db-auth-token API, valid 15 minutes
  • Supported: Aurora MySQL 5.7/8.0 and Aurora PostgreSQL 10+
  • No credentials stored in application code
  • Attach IAM role to EC2 / Lambda — they get DB access automatically
  • Exam: “no passwords in code, EC2 to Aurora” → IAM DB auth
🗝️

Secrets Manager (Recommended)

  • Store Aurora master password in Secrets Manager
  • Native Aurora integration — automatic rotation without downtime
  • Rotation schedule: 30 / 60 / 90 days or custom
  • App reads secret at runtime; never hardcoded
  • Works for all engines (MySQL, PG) — unlike IAM auth
  • Exam: “rotate DB credentials automatically” → Secrets Manager
Aurora Backups — Continuous by Design Core

Aurora backups work differently from RDS. Because the storage layer continuously logs all changes to S3 in the background, Aurora does not have a traditional backup window. The backup process never interrupts the cluster and causes zero performance impact — on any instance, in any configuration.

🔄

Automated Backups (Always On)

  • Continuous backup to S3 — cannot be disabled
  • Retention: 1–35 days (default 1 day, set to at least 7)
  • Enables Point-in-Time Recovery to any second within retention
  • No backup window — zero performance impact always
  • Stored in S3 (AWS-managed, not visible in your S3 console)
  • Backup data spans all AZs — regionally durable
📸

Manual Snapshots

  • User-initiated at any time
  • Retained indefinitely until you delete them
  • Stored in S3 — visible in the Aurora console
  • Survive cluster deletion
  • Copy across regions for cross-region DR
  • Share with other AWS accounts
Aurora Backtrack — Rewind Without Restoring Advanced

Aurora Backtrack is an Aurora-exclusive feature that lets you rewind your running database to a previous point in time in place — without creating a new cluster. Instead of restoring a snapshot (which creates a new endpoint), Backtrack reverses the cluster volume itself within seconds. This is powerful for accidental schema drops or data corruption.

How Backtrack Works

  • Pre-define a backtrack window (up to 72 hours)
  • Aurora retains change records for that window
  • To backtrack: specify a target timestamp
  • Cluster pauses, reverses changes — back online in seconds
  • Same cluster, same endpoint — no DNS change
  • Supported: Aurora MySQL only (not PostgreSQL)
🔧

Backtrack vs PITR

  • Backtrack: rewinds the existing cluster in-place — same endpoint, seconds
  • PITR: creates a new cluster from backup — new endpoint, minutes
  • Use Backtrack for: accidental DROP TABLE, recent data corruption
  • Use PITR for: longer range recovery, keeping original cluster intact
  • ⚠️ Overhead: Backtrack change records consume additional storage; plan capacity for 72h window
  • Exam: “rewind Aurora quickly without new cluster” → Backtrack
Aurora vs RDS — Backup Differences Core
Feature RDS Aurora
Backup method Daily snapshot + transaction logs Continuous log streaming to S3
Backup window Required (brief I/O pause single-AZ) None — continuous, zero impact
Can disable backups Yes (set retention = 0) No — always on
Backtrack ❌ Not supported ✅ Aurora MySQL (up to 72h)
Restore creates New DB instance (new endpoint) New cluster (new endpoint) or Backtrack in-place
Performance impact Brief I/O pause (single-AZ) Zero — storage-level continuous
🧠 Key Insight

Aurora never has a backup window and you cannot disable backups — continuous backup to S3 is architectural. Backtrack is Aurora's unique power move: rewind the live cluster in seconds rather than restoring a new one. Encryption must be set at creation — and because storage is shared, one KMS key covers the entire cluster including all replicas and snapshots.

Chapter Summary Introductory
  • Private subnet + Security Group: Aurora never accessible from internet; SG allows app SG only
  • KMS at rest: must enable at creation; one key covers cluster volume, snapshots, replicas
  • IAM auth: MySQL/PG only; token-based, no passwords; Secrets Manager works for all engines
  • Continuous backup: always on, no backup window, zero performance impact — cannot be disabled
  • Backtrack: Aurora MySQL only — rewind live cluster in-place (up to 72h) without new endpoint
  • Exam: “rewind Aurora without new cluster” → Backtrack; “Aurora backup window” → none needed
07
Chapter Seven

Architecture Patterns

Pattern 1 — High-Performance Web Application Core
Pattern 1 — Classic 3-tier: ALB + EC2 Auto Scaling + Aurora Multi-Reader Cluster
VPC VPC
ALB
ALB
Public subnet
HTTPS
App Tier (Private)
EC2
EC2
AZ-a
EC2
EC2
AZ-b
Aurora Cluster (Private)
Writer
Writer
AZ-a•R/W
R1
Reader 1
AZ-b
R2
Reader 2
AZ-c
Writes → cluster endpoint • Reads → reader endpoint • Secrets Manager for credentials • KMS at rest
Pattern 2 — Read-Heavy App with Caching Layer Core
Pattern 2 — Cache-aside: ElastiCache absorbs hot reads, Aurora handles cache misses + all writes
App
App Server
Check cache first
ElastiCache
ElastiCache
Redis • hot reads
Sub-ms latency
cache miss ↓
Reader
Reader EP
Aurora readers
load-balanced
writes ↓
Writer
Writer
Aurora writer
cluster EP
ElastiCache serves hot data • Aurora readers handle warm reads • Aurora writer handles all mutations • Aurora Auto Scaling adds readers under load
Pattern 3 — Serverless API (Lambda + RDS Proxy + Aurora Serverless v2) Advanced
Pattern 3 — Fully serverless: API GW + Lambda + RDS Proxy + Aurora Serverless v2
API GW
API Gateway
REST / HTTP API
Auth + routing
Lambda
Lambda
Business logic
VPC or non-VPC
RDS Proxy
RDS Proxy
Connection pool
IAM auth
Failover buffer
Aurora Serverless v2
Aurora SLv2
0.5–128 ACU
Scales with Lambda
No idle waste
Lambda bursty invocations • RDS Proxy prevents connection exhaustion • Serverless v2 compute matches traffic • Pay only for ACU-seconds used
Pattern 4 — Aurora Global Database (Multi-Region) Advanced

Aurora Global Database replicates an entire Aurora cluster to up to 5 secondary read-only regions using dedicated replication infrastructure at the storage layer — not over the public internet. Replication lag is typically under 1 second. In a disaster, any secondary can be promoted to a full primary in under 1 minute.

🌐

Replication

  • Storage-level replication — not binlog
  • <1 second typical lag across regions
  • Dedicated AWS replication network (not internet)
  • Up to 5 secondary regions
  • Each secondary has its own cluster of readers
🛡️

Disaster Recovery

  • RPO: <1 second (storage-level, near-zero data loss)
  • RTO: <1 minute (promote secondary to primary)
  • Promotion is a manual or automated operation
  • Promoted secondary becomes a full independent primary
  • Best cross-region DR for relational databases on AWS
🗺️

Global Read Latency

  • App in eu-west-1 reads from secondary in eu-west-1
  • No transcontinental round-trip for reads
  • Secondaries are read-only (writes must go to primary)
  • Global write endpoint available — routes to primary
  • Use case: global SaaS, multinational compliance
Pattern 4 — Aurora Global Database: primary in us-east-1, replicated to eu-west-1 + ap-southeast-1
PRIMARY (us-east-1)
Primary
Aurora Primary
Writer + 2 readers
All writes here
<1s lag ⇒
storage replication
SECONDARY (eu-west-1)
Secondary EU
Aurora Secondary
Read-only readers
EU user reads
<1s lag ⇒
storage replication
SECONDARY (ap-southeast-1)
Secondary APAC
Aurora Secondary
Read-only readers
APAC user reads
🆘 Primary region failure → promote eu-west-1 to primary in <1 min • RPO <1s • RTO <1 min
Global Database — Failover Types & Upgrade Order Advanced
🚨

Failover Options

  • Managed failover: promote a secondary to primary via console/CLI — Aurora coordinates the transition, replication paused, ~1 min
  • Detach & promote: detach secondary region from the global cluster, promote it to an independent standalone cluster — manual, for edge-case DR
  • Original primary becomes a secondary cluster after managed failover (it rejoins)
  • Exam: “fastest cross-region DR” → Aurora Global Database managed failover
🔄

Engine Version Upgrade Order

  • Primary region upgrades first
  • Secondaries upgraded after — they lag behind until upgrade is applied
  • Plan global maintenance windows: upgrade primary during primary low-traffic, then secondary
  • During secondary upgrade: reads from that region temporarily unavailable
  • Cannot upgrade a secondary before the primary
Pattern 5 — Aurora Machine Learning (SQL → SageMaker) Advanced

Aurora Machine Learning lets you call Amazon SageMaker and Amazon Comprehend endpoints directly from SQL statements. The ML inference happens without moving data out of Aurora — the storage layer coordinates with SageMaker in the background and returns the prediction as a query result.

🧠

How Aurora ML Works

  • Create a ML function that maps to a SageMaker endpoint or Comprehend API
  • Call it from SQL: SELECT aurora_ml_predict(col1, col2) FROM table
  • Aurora batches rows, calls SageMaker, merges results back into your result set
  • Supports: SageMaker (custom models) and Amazon Comprehend (sentiment, language, NER)
  • Supported: Aurora MySQL 8.0 and Aurora PostgreSQL 13+
📊

Use Cases

  • Real-time product recommendations from SQL query
  • Sentiment scoring customer reviews inline with application queries
  • Fraud detection integrated into transaction processing
  • No data pipeline, no extra infrastructure, no data movement
  • Exam: “call ML model from SQL in Aurora” → Aurora Machine Learning
You Need... Use Why
MySQL / PG — highest perf + HA Aurora Shared storage, 15 replicas, <30s failover
MySQL / PG — standard managed RDS Simpler, cheaper, sufficient for most workloads
Oracle / SQL Server RDS (not Aurora) Aurora doesn't support these engines
Variable workload, serverless compute Aurora Serverless v2 ACU auto-scales, zero cold start, pay per use
Global reads + sub-1s cross-region DR Aurora Global Database RPO <1s, RTO <1 min, 5 secondary regions
Key-value / document / serverless scale DynamoDB No SQL, unlimited scale, single-digit ms
Sub-ms caching, reduce DB load ElastiCache Redis / Memcached in front of Aurora
Exam Cheatsheet Core

🎯 Exam Keywords → Aurora Answer

  • “Aurora ≠ fast RDS” → decoupled shared storage architecture
  • “more than 5 read replicas” → Aurora (up to 15)
  • “replica lag <100ms” → Aurora storage-level replication
  • “failover <30 seconds” → Aurora (shared storage = promoted reader already has data)
  • “auto-scale read replicas” → Aurora Auto Scaling
  • “analytics queries on Aurora, no extra infra” → Aurora Parallel Query
  • “variable / spiky DB workload” → Aurora Serverless v2
  • “serverless compute + relational DB” → Lambda + RDS Proxy + Aurora Serverless v2
  • “Lambda + Aurora, minimize cost + connections” → RDS Proxy + Serverless v2
  • “rewind Aurora without new cluster” → Aurora Backtrack (MySQL only, up to 72h)
  • “cross-region DR RPO <1s, RTO <1min” → Aurora Global Database
  • “global low-latency reads + single write region” → Aurora Global Database
  • “fastest cross-region failover” → Aurora Global Database managed failover
  • “call ML model from SQL” → Aurora Machine Learning (SageMaker / Comprehend)
  • “high I/O workload, predictable DB cost” → Aurora I/O-Optimized
  • “Aurora backup window” → there is none; continuous backup, zero impact
  • “Oracle / SQL Server on Aurora” → NOT possible; use RDS for those engines
  • “6 copies, 3 AZs” → Aurora storage always; not configurable
  • “storage auto-scales” → Aurora cluster volume, 10 GB → 128 TiB, no downtime
🧠 Final Insight

Aurora is a cloud-native redesign — shared distributed storage is the foundation that enables everything else: instant replicas, sub-30s failover, continuous no-impact backups, Backtrack, and Global Database. Pick Aurora when RDS hits its limits: more replicas needed, faster failover required, global distribution demanded, or compute needs to auto-scale with Serverless v2.