Amazon Aurora —
Cloud-Native Relational Database
Aurora is not just a faster RDS — it is a ground-up redesign of the relational database for the cloud. Decoupled storage and compute, six-way replication across three AZs, MySQL and PostgreSQL compatible, and up to 5× MySQL performance. Aurora is what you choose when RDS is not enough.
⚡ Aurora in 30 Seconds
- Cloud-native relational DB — MySQL & PostgreSQL compatible
- Shared distributed storage — decoupled from compute, auto-scales to 128 TiB
- 6 copies of data across 3 AZs — HA is built-in, not an add-on
- Up to 15 read replicas — all share the same underlying storage
- 5× MySQL and 3× PostgreSQL performance vs community editions
- Aurora Serverless v2 — compute scales instantly from 0.5 to 128 ACUs
What is Amazon Aurora
When AWS launched RDS, they took existing database engines (MySQL, PostgreSQL, Oracle) and ran them on managed EC2 infrastructure. That solved ops burden — no more patching, backups are handled — but the database architecture itself was unchanged. It was still designed for single-server, spinning-disk-era assumptions.
👉 The root problem: Traditional databases couple storage to compute. Each instance owns its disk. Replication means copying data over a network, constantly. Failover means waiting for a standby to catch up. Storage limits come from the instance. Cloud demands something better.
Amazon Aurora is a cloud-native relational database built from scratch at AWS, designed to resolve the architectural limitations of traditional databases. It is fully MySQL and PostgreSQL compatible — your SQL works, your drivers work, your ORMs work. But underneath, everything is different.
Cloud-Native Design
Built for distributed cloud storage from scratch. Storage is a distributed, fault-tolerant service — not a single disk attached to a server.
MySQL & PG Compatible
Aurora MySQL 3.x is compatible with MySQL 8. Aurora PostgreSQL 15.x is compatible with PostgreSQL 15. No SQL rewrites needed.
5× / 3× Performance
5× throughput vs MySQL community edition. 3× vs PostgreSQL. Achieved through distributed storage, parallel writes, and log-based replication.
Most people think Aurora is just “RDS but faster”. That is wrong and will cost you exam marks. The difference is architectural:
| Aspect | RDS (MySQL / PG) | Aurora |
|---|---|---|
| Storage model | Instance-local EBS volume | Shared distributed storage tier |
| Storage limit | 64 TiB max (manual scaling) | 128 TiB auto-scales |
| Replication | Async binlog (replica copies data) | Storage-level (no data moves) |
| HA copies | 1 standby (Multi-AZ) | 6 copies across 3 AZs (always) |
| Failover time | 60–120 seconds | <30 seconds |
| Read replicas | Up to 5 (async, data copied) | Up to 15 (share same storage) |
| Replica lag | Milliseconds to seconds | Milliseconds (storage-level) |
🧠 Most People Think (Wrong):
“Aurora = RDS with better hardware” or “Aurora = fast MySQL”
✨ Better Mental Model (Correct):
Aurora = Compute nodes + a cloud-native shared storage service. The database engines (writer + readers) are just compute that plugs into one shared storage fabric. The storage itself is distributed, replicated, and elastic — independent of any single instance.
Old Model (RDS)
- DB engine + local disk = one tightly coupled unit
- Replicate = copy data to another server's disk
- Add replica = duplicate storage cost
- Failover = wait for standby to assume its local disk
- Like a house: each house has its own pipes
New Model (Aurora)
- Storage is a separate elastic service
- Replicate = storage handles it, engines don't know
- Add replica = new compute node, no extra storage
- Failover = another compute node grabs the same storage
- Like a city water system: each tap connects same pipes
Why It Matters
- Faster failover (no data sync needed)
- Instant read replicas (no copy of data)
- Storage grows automatically (no pre-provisioning)
- Lower replication lag (storage-level, not app-level)
- Aurora Global Database becomes feasible
⚠️ Aurora does NOT support Oracle or SQL Server — use RDS for those engines.
Compute Cost (Higher for Aurora)
- db.r6g.large (2 vCPU, 16 GB RAM)
- RDS MySQL: ~$0.18/hr
- Aurora MySQL: ~$0.22/hr (~22% more)
- Difference compounds at scale with many instances
- Aurora is worth it when architecture features justify the delta
Storage Cost (Cheaper at Scale)
- Aurora: shared storage — pay once, used by all replicas
- RDS: each read replica = full data copy = extra storage cost
- 15 Aurora replicas: 1× storage cost
- 15 RDS replicas: 16× storage cost
- At 5+ replicas, Aurora total cost is often lower than RDS
Choose Aurora When
- Need higher throughput than RDS MySQL / PG
- Need more than 5 read replicas (Aurora supports 15)
- Need failover under 30 seconds
- Need global distribution (Aurora Global Database)
- Need serverless variable workloads (Serverless v2)
- Storage >10 TiB or unpredictable growth
- Production OLTP requiring highest availability
Stick with RDS When
- Need Oracle or SQL Server (Aurora doesn't support them)
- Budget is tight (Aurora ~20% higher per instance)
- Workload is light — RDS already meets SLAs
- Need RDS Custom OS-level access
- Legacy app tied specifically to engine minor version
- MariaDB (not supported by Aurora)
Aurora is not RDS with faster hardware — it is a different storage architecture. The shared distributed storage is what enables everything: instant replicas, sub-30s failover, auto-scaling storage, and Global Database. Understand that one idea and the rest of Aurora snaps into place.
- Aurora = cloud-native relational DB — MySQL and PostgreSQL compatible; no Oracle/SQL Server
- Decoupled storage: compute and storage are separate — the fundamental design difference from RDS
- 6 copies / 3 AZs: HA is architectural, not optional — no separate Multi-AZ to configure
- 5× MySQL / 3× PostgreSQL performance vs community editions
- Aurora ≠ fast RDS: the storage architecture is completely different; that is the key exam insight
- Mental model: writer + readers are compute nodes plugging into one shared storage system
Aurora Architecture — Shared Storage & Compute Separation
Every Aurora deployment is called a DB cluster. A cluster has two completely independent layers: the compute layer (DB instances that run the MySQL or PostgreSQL engine) and the storage layer (the shared distributed volume that all compute instances read and write). These two layers scale independently of each other.
Compute Layer (DB Instances)
- Writer instance — one per cluster, handles all writes
- Reader instances — up to 15, read-only, same data
- Each instance is a specific instance class (db.r6g, db.t3, etc.)
- Instances can be added or removed without touching storage
- Serverless v2 = compute that auto-scales instead of fixed instances
Storage Layer (Cluster Volume)
- Single logical volume shared by all compute instances
- Physically: 6 copies spread across 3 AZs (2 per AZ)
- Stored in 10 GB segments (“protection groups”)
- Auto-scales from 10 GB to 128 TiB with zero downtime
- You pay only for storage actually used (not pre-provisioned)
The Aurora cluster volume is divided into 10 GB segments called protection groups. Each protection group is replicated six times across three AZs. This granularity means that if a disk fails, only the corresponding segments need to be repaired — not the entire database. Aurora performs this repair continuously in the background, peer-to-peer between storage nodes, without involving the compute layer at all.
👉 Why this matters for HA: Traditional databases repair by replacing the failed node and copying all data back. Aurora repairs individual 10 GB segments, in parallel, across many storage nodes simultaneously. A 1 TB database can repair a failed copy in minutes, not hours.
Aurora uses a quorum-based write model. When the writer commits a transaction, it does not write data pages to storage — it writes only redo log records to the 6 storage nodes. The write is acknowledged when 4 of 6 nodes confirm receipt. Storage nodes reconstruct data pages from log records locally. This is why Aurora writes are so fast — less data moves over the network.
Write Quorum: 4/6
- Writer sends redo log to all 6 nodes
- Waits for 4 acknowledgements
- Commit confirmed — client gets response
- Remaining 2 catch up asynchronously
- Can tolerate 2 failed storage nodes without halting writes
Read Quorum: 3/6
- Reads need 3 of 6 nodes to agree
- Can tolerate 3 failed nodes and still serve reads
- Readers also use log records to materialize pages locally
- Dramatically reduces replication lag vs binlog
Only Logs, No Pages
- Writer sends log records (small), not full data pages (large)
- Network I/O reduced by up to 7× vs traditional replication
- Storage nodes apply logs locally — no round-trips for page writes
- This is the core reason for Aurora's write performance advantage
Aurora exposes several DNS endpoints. Knowing which to use for which workload is critical for exam questions:
| Endpoint Type | Points To | Use For |
|---|---|---|
| Cluster endpoint | Current writer instance | All writes — always points to writer after failover |
| Reader endpoint | All reader instances (load-balanced) | All reads — distributes across readers automatically |
| Custom endpoint | A specific subset of instances | Analytics workloads on specific high-memory instances |
| Instance endpoint | One specific instance | Diagnostics, direct maintenance — not for production app |
How Storage Scales
- Starts at 10 GB minimum
- Grows in 10 GB increments automatically
- Maximum: 128 TiB
- No downtime, no instance restart
- You pay per GB-month actually used — not for pre-provisioned capacity
Storage Pricing Model
- Pay for storage consumed (not allocated)
- Storage never shrinks automatically (high-water mark model)
- I/O requests billed separately in Aurora Standard
- I/O requests free (included) in Aurora I/O-Optimized
| Feature | Aurora Standard | Aurora I/O-Optimized |
|---|---|---|
| Storage rate | Lower (~$0.10/GB-month) | Higher (~$0.225/GB-month) |
| I/O billing | Per million requests (~$0.20) | Included — free |
| Best for | Low-to-moderate I/O workloads | High I/O workloads (>25% of bill is I/O) |
| Exam keyword | Default | “high I/O, predictable costs” |
💡 Rule of thumb: if your I/O charges exceed 25% of your total Aurora bill, switch to I/O-Optimized and you will likely save money.
Aurora writes only redo log records to storage, not full data pages. The quorum model (4/6 for write, 3/6 for read) means the cluster continues operating even when multiple storage nodes fail. Storage grows automatically — you never pre-provision. Adding readers costs no extra storage.
- DB cluster = compute layer (writer + readers) + storage layer (cluster volume) — independent layers
- Cluster volume = 6 copies across 3 AZs, 10 GB segments, self-healing, auto-scales to 128 TiB
- Writes: redo logs sent to 6 storage nodes, commits on 4/6 ack — no full page writes
- Reads: quorum 3/6 — readers materialize pages locally from logs, near-zero lag
- Endpoints: cluster (writer), reader (load-balanced), custom (subset), instance (direct)
- Storage cost: pay per GB used, not allocated; I/O-Optimized tier for high-I/O workloads
High Availability & Replication
With RDS, you opt into high availability by enabling Multi-AZ, which provisions a separate standby instance. With Aurora, HA is the default state. The 6-copy storage model exists for every Aurora cluster regardless of whether you add readers or not. There is no “single-AZ Aurora” at the storage level.
👉 Critical exam point: You do NOT need to enable Multi-AZ in Aurora — the 6-copy storage replication across 3 AZs is always on. What you control is how many compute instances (readers) you add for faster compute-level failover.
AZ-a (2 copies)
- 2 independent copies of every storage segment
- Even if both fail — 4 copies remain alive
- Typically hosts the writer instance
- AZ outage loses 2 copies — writes continue (4/6 quorum met)
AZ-b (2 copies)
- 2 independent copies
- Hosts reader instances for spread reads
- AZ failure here: 4 copies in AZ-a + AZ-c remain
- Reads continue, writes continue
AZ-c (2 copies)
- 2 independent copies
- Full geographic separation from AZ-a and AZ-b
- Highest-tier DR coverage: single-AZ outage never threatens writes
- Reads continue from AZ-a + AZ-b readers
| Scenario | Copies Lost | Writes | Reads |
|---|---|---|---|
| 1 disk fails | 1 of 6 | ✔ Continue (4/6 quorum) | ✔ Continue (3/6 quorum) |
| 1 full AZ outage | 2 of 6 | ✔ Continue (4/6 quorum) | ✔ Continue (3/6 quorum) |
| 2 disks fail (diff AZs) | 2 of 6 | ✔ Continue | ✔ Continue |
| 3 disks fail | 3 of 6 | ❌ Halted (need 4) | ✔ Continue (3/6 quorum) |
| 4+ disks fail | 4+ of 6 | ❌ Halted | ❌ Halted |
When the writer instance fails, Aurora promotes one of the existing reader instances to become the new writer. Because storage is shared, the promoted reader already has the complete dataset — it just switches its mode. This is why Aurora failover is so much faster than RDS.
Failover Timeline
- Writer failure detected: ~10–20 seconds
- Reader promoted to writer: immediate (no data copy)
- DNS updated to point to new writer
- Applications reconnect: total ~30 seconds
- With Aurora readers present: typically <30 seconds
- Without readers (single instance): ~60–120 seconds (new instance launched)
Failover Priority Tiers
- Each reader has a priority tier: 0 (highest) – 15 (lowest)
- Aurora promotes the reader at the highest priority tier
- Tie in priority: promotes the largest instance first
- Second tie: promotes by instance ID alphabetically
- Set tier via console / CLI — use tier 0 for your primary DR reader
RDS Multi-AZ Failover
- Standby is in a separate AZ with its own EBS
- Data was synchronously replicated, but standby still needs to“take over” its volume
- DNS updated, OS mounts change — process takes time
- Failover: 60–120 seconds
- Only 1 standby — one chance for failover
Aurora Failover
- Reader already shares the cluster volume
- Promotion = change compute role, no data handoff
- Up to 15 readers — any can become writer
- Priority tiers control which one is chosen
- Failover: <30 seconds (with readers)
What is Multi-Master
- Up to 4 writer nodes in a single Aurora MySQL cluster
- All nodes accept writes simultaneously
- Conflict resolution handled at storage layer
- No read replicas in multi-master mode
- Single-master covers 99% of production use cases
When to Consider It
- Rarely needed — most HA/performance needs met by single-master + replicas
- Use case: apps requiring write continuity during writer failover without a pause
- Not a replacement for sharding or distributed databases
- Supported: Aurora MySQL only (not PostgreSQL)
- Exam: almost always refers to single-master; multi-master is niche
How Self-Healing Works
- Storage nodes continuously monitor each other
- When a node or disk detects data corruption or failure, peer nodes donate segments to repair it
- Repair is parallel across many segment pairs simultaneously
- 10 GB per segment = fast repair (not terabytes at once)
- Completely transparent to compute (writer + readers)
Impact on Availability
- Aurora can lose 1 copy and be below quorum resilience threshold — repair begins immediately
- Mean time to repair (MTTR): minutes for typical segments
- Dramatically lowers dual-failure probability
- No human intervention required
- Aurora tracks unhealthy segments and prioritises their repair
Aurora HA is architectural, not operational. 6-copy quorum storage means a full AZ outage never loses data. Compute failover is fast (<30s) because promoted readers already share the storage. Self-healing continuously restores the 6-copy redundancy without you doing anything.
- 6 copies / 3 AZs: always on — not optional; lose a full AZ and writes continue (4/6 quorum)
- Write quorum: 4/6 — can lose 2 storage nodes and still write
- Read quorum: 3/6 — can lose 3 storage nodes and still read
- Compute failover: <30s — reader promotes because it already has the data
- Priority tiers 0–15: control which reader becomes writer on failover
- Self-healing storage: peer-to-peer segment repair, continuous, transparent, minutes MTTR
Scaling & Read Replicas
Aurora supports up to 15 read replicas per cluster — three times more than RDS. Because all replicas share the same underlying cluster volume, adding a replica means provisioning new compute only. No data is copied. No extra storage cost per replica. The reader is live and serving traffic within minutes.
👉 Key exam insight: Aurora read replicas share the same storage as the writer. Adding a 15th replica costs the same as adding the 1st — just the compute instance. With RDS, every read replica is an independent database that holds a full copy of the data, which means you pay for storage per replica.
Replica Limits
- Up to 15 read replicas per Aurora cluster
- All replicas share the same cluster volume
- Each replica is in its own AZ (recommended) or same AZ
- Each has its own instance endpoint
- All served via the single reader endpoint (load-balanced)
Replication Lag
- Storage-level replication — not binlog
- Typically <100 ms behind writer
- Much lower than RDS async replication
- Replicas get the same log records as the storage layer
- Exam: Aurora replica lag ≈ milliseconds; RDS lag ≈ seconds
Cost Advantage
- No extra storage per replica (shared volume)
- Pay only for the compute instance class
- Can use smaller instance for read-only workloads
- Scale down replicas during off-peak (or use Serverless v2)
- RDS: each replica = full data copy = double/triple storage cost
The Aurora reader endpoint is a single DNS address that automatically distributes incoming connections across all available reader instances using connection-level load balancing. You point your read traffic at one endpoint and Aurora handles the distribution — no application-side logic needed.
How Reader Endpoint Works
- Connection-level load balancing (not query-level)
- Each new connection to the reader endpoint lands on a different reader (round-robin)
- If a reader fails, the endpoint stops routing to it automatically
- New readers added via Auto Scaling are picked up automatically
- One endpoint to manage regardless of how many replicas you have
Custom Endpoints
- Create a custom endpoint pointing to a specific subset of instances
- Use case: analytics team uses large db.r6g.4xlarge readers; web tier uses small db.t3
- Prevents analytics queries from consuming web app reader capacity
- Multiple custom endpoints per cluster allowed
- Reader endpoint + custom endpoints can coexist
Aurora Aurora Auto Scaling automatically adds or removes reader instances based on a CloudWatch metric — typically CPU utilization or connections per instance. You define minimum and maximum replica counts and a target metric value. Aurora scales replicas up during traffic spikes and removes them during quiet periods.
| Feature | RDS Read Replicas | Aurora Read Replicas |
|---|---|---|
| Max replicas | 5 | 15 |
| Storage per replica | Full copy of DB | Shared — no extra storage |
| Replication lag | Milliseconds – seconds (binlog) | <100 ms (storage-level) |
| Add replica time | Minutes to hours (data copy) | Minutes (compute only) |
| Auto Scaling | ❌ Manual only | ✅ Aurora Auto Scaling |
| Failover promotion | Manual | Automatic (<30s) |
| Reader endpoint | ❌ Manual per-replica | ✅ Single load-balanced endpoint |
Aurora scales reads by adding compute, not storage. Up to 15 replicas, each sharing the cluster volume at near-zero extra cost. The reader endpoint abstracts the entire replica fleet behind one DNS address. Auto Scaling adds and removes replicas automatically without human intervention.
Aurora Parallel Query pushes the computation of scans, joins, and aggregations down to the Aurora storage layer, running in parallel across thousands of storage nodes. Instead of pulling all data up to the compute instance to process it, the storage layer does the work where the data lives. This dramatically reduces the data transferred to the compute instance and speeds up analytical queries significantly.
How It Works
- Full table scans, JOINs, GROUP BY, aggregates pushed to storage nodes
- Up to thousands of parallel threads across storage layer
- Only the final result set returned to compute instance
- Transparent — same SQL, no schema changes needed
- Supported: Aurora MySQL 8.0
When to Use
- Analytics queries on large tables (>1 GB)
- ELT transformations within the database
- Reporting queries with
COUNT(),SUM(),GROUP BY - Not for: short OLTP queries with
LIMIT 10— overhead not worth it - Exam: “run analytics on Aurora without extra infrastructure” → Parallel Query
- 15 read replicas (vs 5 for RDS) — all share the cluster volume, no extra storage cost
- Replication lag <100 ms — storage-level, much lower than RDS binlog async replication
- Reader endpoint: single DNS, connection-level load-balanced across all readers
- Custom endpoints: route specific workloads (analytics) to specific instances
- Aurora Auto Scaling: adds/removes readers based on CPU / connections — fully automatic
- Parallel Query: pushes scans/aggregates to storage layer — analytics speedup without extra infra
- Exam: “need more than 5 read replicas” → Aurora; “read replica auto-scaling” → Aurora Auto Scaling
Aurora Serverless v2
Aurora Serverless v2 is a configuration for Aurora DB instances where compute capacity scales automatically based on actual workload demand — in fractions of a second. Instead of choosing a fixed instance class (db.r6g.large), you define a minimum and maximum ACU range. Aurora scales within that range continuously without any downtime.
👉 Key distinction: Serverless v2 is NOT a separate product — it is a capacity type for an Aurora DB instance. The same Aurora cluster can mix provisioned instances (fixed size) and Serverless v2 instances. The storage layer is the same shared cluster volume either way.
What is an ACU
- 1 ACU ≈ 2 GiB RAM + proportional CPU + network
- Minimum: 0.5 ACU
- Maximum: 128 ACU
- Scales in increments as small as 0.5 ACU
- You set min and max — Aurora manages the rest
Scaling Speed
- Scales up in fractions of a second
- No cold start (unlike Serverless v1)
- Scales down gradually to avoid thrashing
- Responds to CPU, connections, and memory pressure
- Transparent to the application — no connection drop
Cost Model
- Pay per ACU-second consumed
- No charge for idle (below min ACU)
- Min ACU is always running (warmth)
- Better for variable workloads vs paying for peak 24/7
- Storage billed same way as provisioned Aurora
| Feature | Provisioned Aurora | Serverless v2 |
|---|---|---|
| Compute sizing | Fixed instance class (db.r6g.4xlarge) | ACU range (min 0.5 – max 128) |
| Scaling | Manual resize (brief downtime) | Automatic, zero downtime, subsecond |
| Cost model | Per hour for instance size (always-on) | Per ACU-second consumed |
| Best for | Predictable, steady traffic | Variable, spiky, or unpredictable traffic |
| Cold start | N/A — always running | None (min ACU keeps instance warm) |
| Mixed cluster | — | ✅ Can mix with provisioned in same cluster |
Serverless v2 is ideal for Lambda-based architectures where traffic is bursty and unpredictable. Pairing with RDS Proxy gives you connection pooling on top of auto-scaling compute — neither Lambda connection exhaustion nor wasted idle compute.
Best Use Cases
- Variable / spiky workloads — e-commerce, news spikes, event-driven apps
- Dev / test environments — scales to near-zero at night
- Multi-tenant SaaS — each tenant's DB right-sizes itself
- Lambda / API Gateway backends — matches serverless compute pattern
- New apps — unknown traffic profile, no over-provisioning
- Mixed clusters: Serverless v2 readers + provisioned writer
When Provisioned is Better
- Steady, predictable traffic (provisioned is cheaper at constant load)
- Workloads needing specific instance family guarantees
- Need for the very highest consistent performance (db.r6g.16xlarge)
- Cost predictability required (Serverless v2 can spike with traffic)
| Feature | Serverless v1 (Legacy) | Serverless v2 (Current) |
|---|---|---|
| Scales to zero | ✅ Yes (DB pauses when idle) | ❌ No (min 0.5 ACU stays warm) |
| Cold start | 25–30 seconds | None — always responsive |
| Scaling speed | Minutes | Fractions of a second |
| Engine support | Limited | Full Aurora MySQL 8, PostgreSQL 13+ |
| Mixed cluster | ❌ Not supported | ✅ Mix with provisioned instances |
| Production-ready | ❌ Not recommended | ✅ Yes — recommended choice |
Why Combine Them
- RDS Proxy keeps its connection pool to Aurora always warm
- When Serverless v2 is at min ACU (0.5), Proxy holds its connections open
- Sudden Lambda burst → Proxy absorbs the connection spike without forcing Serverless v2 to scale up prematurely
- Serverless v2 scales slowly down after peak — Proxy prevents connection disruption during scale-down
What Each Solves
- RDS Proxy: connection exhaustion from Lambda — pools & reuses connections
- Serverless v2: compute waste — scales ACU to match actual workload
- Together: neither the DB nor the connection layer is over-provisioned
- Exam: “Lambda + Aurora, minimize cost + connections” → RDS Proxy + Serverless v2
Serverless v2 solves the provisioning problem: you stop guessing at peak capacity and instead let Aurora scale compute instantly within your defined range. No cold starts, no downtime during scaling, fractions-of-a-second response. Pair with RDS Proxy for Lambda workloads to get both connection efficiency and elastic compute.
- Serverless v2 = Aurora DB instance capacity type; compute auto-scales within ACU min/max range
- ACU: 1 ACU ≈ 2 GiB RAM; range 0.5–128; scales in fractions of a second
- No cold start (unlike v1): min ACU keeps instance warm; scales up instantly on demand
- Pay per ACU-second — cheaper than provisioned for variable/spiky workloads
- Same cluster volume: mix Serverless v2 and provisioned instances in one cluster
- Exam: “variable / unpredictable DB workload” or “serverless compute + DB” → Aurora Serverless v2
Security & Backups
Aurora always runs inside a VPC. A DB subnet group spanning at least 2 AZs is required — Aurora uses all three AZs for its storage regardless of where the compute instances sit. Best practice: place compute in private subnets, with Security Groups allowing only your app tier to reach the Aurora port.
KMS Encryption
- Must enable at cluster creation time — cannot add later
- Uses AWS KMS (AES-256)
- Encrypts: cluster volume, automated backups, snapshots, read replicas
- Shared storage = one KMS key encrypts everything
- Read replicas inherit encryption from the cluster — no separate key needed
- To encrypt unencrypted cluster: snapshot → copy with encryption → restore
TLS in Transit
- Download Aurora CA certificate bundle from AWS
- MySQL:
--ssl-ca=AmazonRootCA1.pem - PostgreSQL:
sslmode=verify-full - Enforce server-side: set
require_secure_transport = ON(MySQL) orssl = on(PG) - Encrypts all data between app and Aurora endpoint
IAM DB Authentication
- Authenticate using an IAM token instead of a password
- Token generated via
generate-db-auth-tokenAPI, valid 15 minutes - Supported: Aurora MySQL 5.7/8.0 and Aurora PostgreSQL 10+
- No credentials stored in application code
- Attach IAM role to EC2 / Lambda — they get DB access automatically
- Exam: “no passwords in code, EC2 to Aurora” → IAM DB auth
Secrets Manager (Recommended)
- Store Aurora master password in Secrets Manager
- Native Aurora integration — automatic rotation without downtime
- Rotation schedule: 30 / 60 / 90 days or custom
- App reads secret at runtime; never hardcoded
- Works for all engines (MySQL, PG) — unlike IAM auth
- Exam: “rotate DB credentials automatically” → Secrets Manager
Aurora backups work differently from RDS. Because the storage layer continuously logs all changes to S3 in the background, Aurora does not have a traditional backup window. The backup process never interrupts the cluster and causes zero performance impact — on any instance, in any configuration.
Automated Backups (Always On)
- Continuous backup to S3 — cannot be disabled
- Retention: 1–35 days (default 1 day, set to at least 7)
- Enables Point-in-Time Recovery to any second within retention
- No backup window — zero performance impact always
- Stored in S3 (AWS-managed, not visible in your S3 console)
- Backup data spans all AZs — regionally durable
Manual Snapshots
- User-initiated at any time
- Retained indefinitely until you delete them
- Stored in S3 — visible in the Aurora console
- Survive cluster deletion
- Copy across regions for cross-region DR
- Share with other AWS accounts
Aurora Backtrack is an Aurora-exclusive feature that lets you rewind your running database to a previous point in time in place — without creating a new cluster. Instead of restoring a snapshot (which creates a new endpoint), Backtrack reverses the cluster volume itself within seconds. This is powerful for accidental schema drops or data corruption.
How Backtrack Works
- Pre-define a backtrack window (up to 72 hours)
- Aurora retains change records for that window
- To backtrack: specify a target timestamp
- Cluster pauses, reverses changes — back online in seconds
- Same cluster, same endpoint — no DNS change
- Supported: Aurora MySQL only (not PostgreSQL)
Backtrack vs PITR
- Backtrack: rewinds the existing cluster in-place — same endpoint, seconds
- PITR: creates a new cluster from backup — new endpoint, minutes
- Use Backtrack for: accidental
DROP TABLE, recent data corruption - Use PITR for: longer range recovery, keeping original cluster intact
- ⚠️ Overhead: Backtrack change records consume additional storage; plan capacity for 72h window
- Exam: “rewind Aurora quickly without new cluster” → Backtrack
| Feature | RDS | Aurora |
|---|---|---|
| Backup method | Daily snapshot + transaction logs | Continuous log streaming to S3 |
| Backup window | Required (brief I/O pause single-AZ) | None — continuous, zero impact |
| Can disable backups | Yes (set retention = 0) | No — always on |
| Backtrack | ❌ Not supported | ✅ Aurora MySQL (up to 72h) |
| Restore creates | New DB instance (new endpoint) | New cluster (new endpoint) or Backtrack in-place |
| Performance impact | Brief I/O pause (single-AZ) | Zero — storage-level continuous |
Aurora never has a backup window and you cannot disable backups — continuous backup to S3 is architectural. Backtrack is Aurora's unique power move: rewind the live cluster in seconds rather than restoring a new one. Encryption must be set at creation — and because storage is shared, one KMS key covers the entire cluster including all replicas and snapshots.
- Private subnet + Security Group: Aurora never accessible from internet; SG allows app SG only
- KMS at rest: must enable at creation; one key covers cluster volume, snapshots, replicas
- IAM auth: MySQL/PG only; token-based, no passwords; Secrets Manager works for all engines
- Continuous backup: always on, no backup window, zero performance impact — cannot be disabled
- Backtrack: Aurora MySQL only — rewind live cluster in-place (up to 72h) without new endpoint
- Exam: “rewind Aurora without new cluster” → Backtrack; “Aurora backup window” → none needed
Architecture Patterns
Aurora Global Database replicates an entire Aurora cluster to up to 5 secondary read-only regions using dedicated replication infrastructure at the storage layer — not over the public internet. Replication lag is typically under 1 second. In a disaster, any secondary can be promoted to a full primary in under 1 minute.
Replication
- Storage-level replication — not binlog
- <1 second typical lag across regions
- Dedicated AWS replication network (not internet)
- Up to 5 secondary regions
- Each secondary has its own cluster of readers
Disaster Recovery
- RPO: <1 second (storage-level, near-zero data loss)
- RTO: <1 minute (promote secondary to primary)
- Promotion is a manual or automated operation
- Promoted secondary becomes a full independent primary
- Best cross-region DR for relational databases on AWS
Global Read Latency
- App in eu-west-1 reads from secondary in eu-west-1
- No transcontinental round-trip for reads
- Secondaries are read-only (writes must go to primary)
- Global write endpoint available — routes to primary
- Use case: global SaaS, multinational compliance
Failover Options
- Managed failover: promote a secondary to primary via console/CLI — Aurora coordinates the transition, replication paused, ~1 min
- Detach & promote: detach secondary region from the global cluster, promote it to an independent standalone cluster — manual, for edge-case DR
- Original primary becomes a secondary cluster after managed failover (it rejoins)
- Exam: “fastest cross-region DR” → Aurora Global Database managed failover
Engine Version Upgrade Order
- Primary region upgrades first
- Secondaries upgraded after — they lag behind until upgrade is applied
- Plan global maintenance windows: upgrade primary during primary low-traffic, then secondary
- During secondary upgrade: reads from that region temporarily unavailable
- Cannot upgrade a secondary before the primary
Aurora Machine Learning lets you call Amazon SageMaker and Amazon Comprehend endpoints directly from SQL statements. The ML inference happens without moving data out of Aurora — the storage layer coordinates with SageMaker in the background and returns the prediction as a query result.
How Aurora ML Works
- Create a ML function that maps to a SageMaker endpoint or Comprehend API
- Call it from SQL:
SELECT aurora_ml_predict(col1, col2) FROM table - Aurora batches rows, calls SageMaker, merges results back into your result set
- Supports: SageMaker (custom models) and Amazon Comprehend (sentiment, language, NER)
- Supported: Aurora MySQL 8.0 and Aurora PostgreSQL 13+
Use Cases
- Real-time product recommendations from SQL query
- Sentiment scoring customer reviews inline with application queries
- Fraud detection integrated into transaction processing
- No data pipeline, no extra infrastructure, no data movement
- Exam: “call ML model from SQL in Aurora” → Aurora Machine Learning
| You Need... | Use | Why |
|---|---|---|
| MySQL / PG — highest perf + HA | Aurora | Shared storage, 15 replicas, <30s failover |
| MySQL / PG — standard managed | RDS | Simpler, cheaper, sufficient for most workloads |
| Oracle / SQL Server | RDS (not Aurora) | Aurora doesn't support these engines |
| Variable workload, serverless compute | Aurora Serverless v2 | ACU auto-scales, zero cold start, pay per use |
| Global reads + sub-1s cross-region DR | Aurora Global Database | RPO <1s, RTO <1 min, 5 secondary regions |
| Key-value / document / serverless scale | DynamoDB | No SQL, unlimited scale, single-digit ms |
| Sub-ms caching, reduce DB load | ElastiCache | Redis / Memcached in front of Aurora |
🎯 Exam Keywords → Aurora Answer
- “Aurora ≠ fast RDS” → decoupled shared storage architecture
- “more than 5 read replicas” → Aurora (up to 15)
- “replica lag <100ms” → Aurora storage-level replication
- “failover <30 seconds” → Aurora (shared storage = promoted reader already has data)
- “auto-scale read replicas” → Aurora Auto Scaling
- “analytics queries on Aurora, no extra infra” → Aurora Parallel Query
- “variable / spiky DB workload” → Aurora Serverless v2
- “serverless compute + relational DB” → Lambda + RDS Proxy + Aurora Serverless v2
- “Lambda + Aurora, minimize cost + connections” → RDS Proxy + Serverless v2
- “rewind Aurora without new cluster” → Aurora Backtrack (MySQL only, up to 72h)
- “cross-region DR RPO <1s, RTO <1min” → Aurora Global Database
- “global low-latency reads + single write region” → Aurora Global Database
- “fastest cross-region failover” → Aurora Global Database managed failover
- “call ML model from SQL” → Aurora Machine Learning (SageMaker / Comprehend)
- “high I/O workload, predictable DB cost” → Aurora I/O-Optimized
- “Aurora backup window” → there is none; continuous backup, zero impact
- “Oracle / SQL Server on Aurora” → NOT possible; use RDS for those engines
- “6 copies, 3 AZs” → Aurora storage always; not configurable
- “storage auto-scales” → Aurora cluster volume, 10 GB → 128 TiB, no downtime
Aurora is a cloud-native redesign — shared distributed storage is the foundation that enables everything else: instant replicas, sub-30s failover, continuous no-impact backups, Backtrack, and Global Database. Pick Aurora when RDS hits its limits: more replicas needed, faster failover required, global distribution demanded, or compute needs to auto-scale with Serverless v2.