Amazon RDS —
Managed Relational Database
Fully managed relational database โ no patching, no backups to configure, no infrastructure to manage. RDS gives you MySQL, PostgreSQL, MariaDB, Oracle, or SQL Server with built-in high availability, automatic failover, and point-in-time recovery.
⚡ RDS in 30 Seconds
- Managed SQL database — AWS handles patching, backups, and hardware
- 6 engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora (separate service)
- Multi-AZ = high availability (synchronous standby, auto-failover)
- Read Replicas = read scaling (asynchronous, up to 15 replicas)
- Runs inside your VPC — private, never on the public internet
What is Amazon RDS
Running a relational database yourself on EC2 requires you to handle everything: installing the engine, configuring storage, setting up replication, scheduling backups, applying patches, monitoring for failures, and manually failing over when the primary goes down. That's weeks of work before you write a single line of application code.
👉 The core problem: Databases need constant operational care — backups, patches, failover, replication. Every hour spent on DB ops is an hour not spent on your product. RDS takes all of that away.
A relational database stores data in structured tables with rows and columns. Tables are related to each other through foreign keys. You query data with SQL. This model works for almost all transactional applications: e-commerce orders, user accounts, financial records, inventory systems.
Structured Data
Schema-defined. Every row in a table has the same columns. Strong data integrity with constraints (NOT NULL, UNIQUE, FOREIGN KEY).
Relationships
Tables join together. A users table relates to an orders table. Complex queries with JOINs, GROUP BY, transactions.
ACID Transactions
Atomicity, Consistency, Isolation, Durability. Either all changes commit, or none do. Critical for financial and medical data.
Amazon RDS (Relational Database Service) is a fully managed service that runs a relational database engine on your behalf inside AWS. You choose the engine, instance size, and storage. AWS handles everything else — provisioning, patching, backups, monitoring, failover.
You Manage on EC2
- Install database engine (MySQL, Postgres...)
- Configure storage, IOPS, file system
- Apply OS + DB patches manually
- Set up replication manually
- Schedule and test backups
- Monitor and respond to failures
- Implement failover scripts
RDS Manages
- Hardware provisioning and lifecycle
- Database engine installation
- Automated OS and DB patching
- Synchronous replication (Multi-AZ)
- Automated daily backups + transaction logs
- Health monitoring + auto-restart
- Automatic failover in <2 minutes
⚠️ Aurora is a separate, AWS-optimized engine — MySQL/PostgreSQL compatible but 5× faster. Covered in the Aurora page.
What is RDS Custom
- Managed DB service with OS + engine access
- Supports Oracle and Microsoft SQL Server only
- SSH access, filesystem access, custom scripts
- Toggle automation (pause RDS automation to apply custom changes)
- AWS still manages backups, Multi-AZ, failover
When to Use RDS Custom
- Legacy Oracle apps requiring custom OS-level configuration
- SQL Server features not exposed by standard RDS
- Custom patches or DB features RDS doesn't support
- Migrating on-prem Oracle to AWS with minimal changes
- Exam: “OS-level access + managed RDS” → RDS Custom
DIY (EC2 Database) = Owning a House
- You choose and install the plumbing (DB engine)
- You fix the boiler when it breaks (patches)
- You call a plumber at 3am (on-call)
- You organize your own insurance (backups)
- Total control, total responsibility
RDS = Managed Apartment Building
- Building manager handles plumbing (AWS manages engine)
- Maintenance team patches issues (automated patching)
- 24/7 on-call building staff (AWS monitoring)
- Fire insurance included (automated backups)
- You just live there and focus on your work
This is the most important concept in RDS and the most common exam mistake. These two features solve completely different problems:
| Feature | Purpose | Replication | Can Serve Reads? |
|---|---|---|---|
| Multi-AZ | 🛡️ High Availability | Synchronous (zero data loss) | ❌ No — standby only |
| Read Replica | 📈 Read Scaling | Asynchronous (slight lag) | ✅ Yes — read traffic |
Multi-AZ = HA (failover protection). Read Replica = performance (scale reads). You can combine both: a Multi-AZ primary with read replicas for a production-grade, highly available, read-scalable database tier.
- RDS = fully managed relational DB — AWS handles patching, backups, failover, replication
- 6 engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server (+ Aurora separately)
- Runs in your VPC — private subnet, security groups, never on public internet
- Multi-AZ = high availability (synchronous standby, NOT readable)
- Read Replicas = read scaling (asynchronous, readable, up to 15)
- Mental model: managed apartment building — you use it, AWS maintains it
Core Concepts — DB Instance, Storage & Endpoints
A DB instance is an isolated database environment running in AWS. It is the fundamental building block of RDS — think of it as a virtual database server. Each DB instance runs one database engine (MySQL, PostgreSQL, etc.) and can contain multiple databases.
Instance Class
- db.t3 — burstable (dev/test)
- db.m6g — general purpose
- db.r6g — memory-optimized
- db.x2g — extra-large memory
- Choose: vCPU + RAM
Storage
- gp2 — general SSD (3 IOPS/GB)
- gp3 — new general SSD, independent IOPS
- io1 / io2 — provisioned IOPS (I/O intensive)
- magnetic — legacy, not recommended
- Auto-scaling available (gp2/gp3/io1)
Endpoint
- DNS hostname, not IP address
- DNS stays the same after failover
mydb.xxx.us-east-1.rds.amazonaws.com- Port: 3306 (MySQL) / 5432 (PG)
- Connection string stays stable
| Type | IOPS | Best For | Cost |
|---|---|---|---|
| gp2 (SSD) | 3 IOPS/GB, burst to 3000 | Most workloads, small DBs | $ |
| gp3 (SSD) | 3000 baseline, up to 16,000 | Most workloads — prefer over gp2 | $$ (20% cheaper than gp2) |
| io1 / io2 | Up to 64,000 IOPS | High I/O: OLTP, ERP, financial | $$$$ |
| Magnetic | ~100 IOPS | Legacy only, avoid | $ |
Parameter Groups
- DB engine configuration settings
- E.g.,
max_connections,innodb_buffer_pool_size - Default parameter group = engine defaults
- Create custom group to tune performance
- Changes may require reboot
Maintenance Windows
- Weekly window for minor version upgrades + patches
- Default: 30-minute window during low-traffic hours
- You choose the window (e.g., Sun 03:00–03:30 UTC)
- Multi-AZ = failover to standby → near-zero downtime
- Single-AZ = brief outage during patch
Storage Auto-Scaling
- Automatically expands storage when near limit
- Set maximum storage threshold (required)
- Trigger: free space <10% for 5+ minutes
- Trigger: last scaling was 6+ hours ago
- Min increment: 10% of current size (or min 5 GB)
- Max limit: 65,536 GiB (64 TiB)
- No downtime • Supported: gp2, gp3, io1
DB Subnet Group
- Collection of subnets in different AZs
- RDS uses these for Multi-AZ placement
- Must span at least 2 AZs
- Best practice: private subnets only
- Required for all RDS instances (even single-AZ)
Prefer gp3 over gp2 — it's cheaper and gives independent IOPS control. Always use a DB subnet group with private subnets. The DNS endpoint never changes — your app always uses the same connection string, even after failover.
What is Performance Insights
- Visualises DB load by wait type (CPU, I/O, locks, network)
- Shows SQL queries consuming the most resources
- 7-day retention free; up to 2 years with paid tier
- Supports MySQL, PostgreSQL, MariaDB, Oracle, SQL Server
- Enabled per DB instance — zero impact on performance
When to Use
- Database is slow but you don't know why
- Find which queries are causing lock waits
- Identify CPU vs I/O bound workloads
- Spot regression after schema change or deploy
- Exam: “identify DB bottleneck / slow queries” → Performance Insights
- DB Instance = isolated DB environment (engine + compute + storage + endpoint)
- Instance classes: db.t3 (burstable), db.m6g (general), db.r6g (memory-optimized)
- Storage: gp3 (best default), io1 (high-IOPS workloads), gp2 (legacy)
- Endpoint = stable DNS hostname — stays the same after failover
- Parameter groups = DB engine tuning; subnet groups = VPC placement
- Storage auto-scaling = expands automatically when <10% free (no downtime)
High Availability — Multi-AZ
Multi-AZ is RDS's high availability feature. When enabled, RDS automatically provisions a standby replica in a different Availability Zone. Data is synchronously replicated to the standby. If the primary fails, RDS automatically fails over to the standby — no manual intervention, no data loss.
👉 Critical to understand: Multi-AZ standby is NOT a read replica. You cannot query the standby instance. It exists solely as a failover target. Its only job is to take over if the primary fails.
Failure Detected
Primary instance fails (hardware, OS crash, AZ outage, maintenance). RDS health checks detect within ~30 seconds.
DNS Flipped
RDS updates the DNS endpoint to point to the standby. Total failover time: typically 1–2 minutes. Your app reconnects to same endpoint automatically.
Standby Promoted
Standby becomes the new primary. RDS automatically provisions a new standby in the other AZ to restore HA. Zero data loss (synchronous replication).
Automatic Failover Triggers
- Primary instance failure (hardware, OS crash)
- Network connectivity loss to primary
- Storage failure on primary
- AZ or data centre outage
- Planned maintenance with reboot
- You manually trigger (Reboot with failover)
Multi-AZ Gotchas
- Standby is in a different AZ, not different region
- Standby DNS endpoint is different — don't use it
- Backups taken from standby (zero I/O impact on primary)
- Both instances same class/storage (can't scale standby independently)
- Extra cost: ~2× (two instances running)
- Not available for all instance classes
RDS also offers a Multi-AZ DB Cluster (a newer option) — one writer and two readable standbys across 3 AZs. Unlike classic Multi-AZ, the standbys can serve reads. Failover is faster (<35 seconds). Currently available for MySQL 8.0 and PostgreSQL 13+.
Classic Multi-AZ (DB Instance)
- 1 primary + 1 standby
- Standby NOT readable
- Failover: ~60–120 seconds
- Supported: all engines
- Exam default assumption
Multi-AZ DB Cluster
- 1 writer + 2 readable standbys
- Standbys ARE readable
- Failover: <35 seconds
- MySQL 8 + PostgreSQL 13+ only
- Higher cost, better read availability
📋 Classic vs Cluster — Comparison Table
| Feature | Classic Multi-AZ | Multi-AZ Cluster |
|---|---|---|
| Standby count | 1 (different AZ) | 2 (different AZs) |
| Standby readable? | ❌ No | ✅ Yes |
| Failover time | 60–120 seconds | <35 seconds |
| Supported engines | All engines | MySQL 8, PostgreSQL 13+ |
Multi-AZ = availability, not performance. The standby is invisible to your app — same endpoint, same experience. Failover is automatic and takes less than 2 minutes. For exam: “Multi-AZ” always means “HA”, not scaling. Standby = NOT queryable (unless using the newer Multi-AZ Cluster).
- Multi-AZ = high availability — primary + synchronous standby in different AZ
- Standby is NOT queryable — exists only as a failover target
- Automatic failover in 1–2 minutes — DNS updated, app reconnects to same endpoint
- Triggers: hardware failure, AZ outage, network loss, planned maintenance reboot
- Zero data loss — synchronous replication means every write reaches standby first
- Multi-AZ Cluster: newer option (MySQL 8/PG 13+) — 2 readable standbys, <35s failover
Scaling — Read Replicas
A Read Replica is an asynchronous copy of your primary RDS instance that serves read-only queries. Applications send writes to the primary and reads to the replica(s). This offloads read traffic from the primary, improving overall database performance for read-heavy workloads.
👉 Core idea: Read Replicas solve performance, not availability. Reads scale horizontally — add more replicas. Writes still go to one primary. Replication is asynchronous — a tiny lag exists.
Multi-AZ — High Availability
- Purpose: survive failures
- Replication: synchronous (zero data loss)
- Queryable: NO — standby only
- Automatic failover: YES (<2 min)
- Same region only
- Cost: ~2× (two instances)
Read Replica — Read Scaling
- Purpose: handle more reads
- Replication: asynchronous (slight lag)
- Queryable: YES — read-only traffic
- Automatic failover: NO — manual promotion
- Same region, cross-region, cross-account
- Cost: additional instance per replica
Limits
- Up to 5 replicas (MySQL, MariaDB)
- Up to 5 replicas (PostgreSQL)
- Up to 5 replicas (Oracle, SQL Server)
- Replicas of replicas (chaining) supported
- Each replica has its own endpoint
Cross-Region
- Create replica in a different AWS region
- Replication over network (encrypted)
- Disaster recovery: promote to primary if home region fails
- Lower latency reads for global users
- Additional cross-region data transfer cost
Promotion
- Manually promote replica → standalone primary
- Replication stops on promotion
- Gets its own read+write endpoint
- Use for: DR failover, migration, scaling writes
- NOT automatic — requires manual action
Good Use Cases
- Read-heavy apps: news sites, e-commerce catalogues
- Analytics queries: run heavy reports without impacting primary
- Geographic distribution: replica in EU for EU users
- DR strategy: cross-region replica for regional failover
- Migration: promote replica to move DB to new region
Not Suitable For
- Applications that need guaranteed consistency (async lag)
- Write scaling — all writes still go to one primary
- Automatic failover — promotion is manual
- Real-time synchronisation (tiny delay always exists)
Read Replicas = scale reads horizontally. Exam: “read-heavy workload” or “offload analytics” → Read Replica. “Automatic failover” → Multi-AZ (not Read Replica). Cross-region replica → disaster recovery + global low-latency reads.
- Read Replica = asynchronous copy for read-only traffic — offloads primary
- Up to 5 replicas per primary (MySQL, PG) — each has its own endpoint
- Async replication = tiny lag — not suitable for apps needing instant consistency
- Cross-region replicas — for DR and lower global read latency
- Promotion = manual action — promotes replica to standalone read+write primary
- Exam trap: Read Replica ≠ automatic failover (that's Multi-AZ)
Backups & Snapshots
RDS provides two backup mechanisms that complement each other: automated backups (enabled by default, point-in-time recovery) and manual snapshots (user-initiated, retained forever until deleted).
Automated Backups
- Enabled by default on all RDS instances
- Daily full backup during backup window
- Continuous transaction log backups (every 5 min)
- Retention: 1–35 days (default 7 days)
- Stored in S3 (AWS-managed, not visible)
- Deleted when DB instance is deleted
- Enables point-in-time recovery (PITR)
Manual Snapshots
- User-initiated (CLI / Console / API)
- Full backup of DB instance
- Retained indefinitely (until you delete)
- Stored in S3 (visible in console)
- Survive DB instance deletion
- Can copy across regions
- Can share with other AWS accounts
Point-in-Time Recovery lets you restore your database to any second within your retention period. RDS combines the daily snapshot with transaction logs to reconstruct the exact state of the database at your requested timestamp.
Backup Window
- 30-minute window daily for full backup
- Set during low-traffic hours
- Brief I/O suspension possible (single-AZ)
- Multi-AZ: backup from standby (zero I/O impact)
- Can change anytime
Retention Period
- Default: 7 days
- Range: 1–35 days
- Set to 0 = disable automated backups
- Increase for longer PITR window
- Automated backups deleted with DB
Cross-Region Backup
- Replicate automated backups to another region
- Additionally protected against regional disaster
- Extra cost (storage + transfer)
- Manual snapshot copy also supported
- Share snapshot with another AWS account
⚠️ Restore creates a NEW DB instance
- Restoring a snapshot or PITR always creates a new RDS instance with a new endpoint
- You must update your application's connection string to the new endpoint
- Original DB instance continues running (if still alive)
- Gives you a clean way to test restoration without impacting production
- Restored instance uses default parameter group — reapply custom settings
Restore: Lazy Loading (S3)
- Restored DB loads data from S3 lazily (on first access per block)
- First queries against restored DB may be slower than usual
- Data is fully loaded in the background over time
- For production restores: enable EBS fast snapshot restore on provisioned IOPS volumes to pre-warm data
- Exam: “first queries slow after snapshot restore” → lazy loading from S3
AWS Backup Service
- Central backup management across AWS services
- Covers RDS, DynamoDB, EFS, EBS, EC2, Aurora
- Set backup policies (frequency, retention, cross-region)
- Compliance reporting (PITR, audit logs)
- Useful for multi-service backup governance
Snapshot Pricing
- First snapshot = full DB size
- Subsequent snapshots = incremental (changed blocks)
- Storage: ~$0.095/GB-month
- Free tier: backup storage up to DB size
- Automated backups: free up to DB size
Event Notifications
- Subscribe to SNS topics for RDS events
- Events: failover, backup started/completed, low storage, maintenance, deletion
- Covers DB instances, parameter groups, snapshots, security groups
- Near real-time alerts — typically within minutes
- Chain to Lambda / SQS for automated response
Automation Examples
- SNS → Lambda: auto-scale read replicas on high load alert
- SNS → Slack/PagerDuty: alert on-call when failover occurs
- SNS → Lambda: take manual snapshot before maintenance window
- Exam: “alert when RDS failover happens” → RDS Event Notification + SNS
Automated backups = PITR within retention (max 35 days). Manual snapshots = forever until deleted, cross-region, cross-account. Restoring always creates a NEW instance. Multi-AZ backups from standby = zero performance impact on primary.
- Automated backups = daily full + continuous transaction logs — enables PITR (1–35 days)
- Manual snapshots = user-initiated, retained forever, cross-region/account shareable
- PITR = restore to any second within retention window — full backup + log replay
- Restore = new instance — new endpoint, update connection string; first queries slow (lazy S3 loading)
- Multi-AZ backup benefit: backup taken from standby — zero I/O impact on primary
- Event Notifications: SNS alerts for failover, backup, low storage, maintenance events
- Exam: “recover to specific time” → PITR; “cross-account snapshot” → manual snapshot copy
Security & Networking
RDS always runs inside a VPC. A DB subnet group specifies the subnets (across at least 2 AZs) where RDS can place instances. Best practice: use private subnets only — no public internet access to your database.
Correct Security Group Setup
- RDS Security Group inbound: only from App SG
- Rule: TCP port 3306 (MySQL) from EC2 security group ID
- Never open to 0.0.0.0/0 (public internet)
- Lambda in VPC → attach Lambda to same VPC, allow its SG
- On-prem: allow VPN/DX CIDR range
Common Mistakes
- Enabling public accessibility (RDS reachable from internet)
- Opening port 3306 to 0.0.0.0/0
- Forgetting outbound rules on app SG
- Lambda outside VPC can't reach private RDS
- Not using SSL — credentials transmitted in plaintext
Encryption at Rest (KMS)
- Enable at creation time — cannot add later
- Uses AWS KMS (AES-256)
- Encrypts: data files, backups, snapshots, replicas, logs
- Read replicas inherit encryption from primary
- To encrypt unencrypted DB: snapshot → copy with encryption → restore
- Exam: “encrypt existing unencrypted RDS” → snapshot + copy method
Encryption in Transit (SSL/TLS)
- Download AWS RDS certificate bundle
- Enable SSL in connection string:
--ssl-ca=rds-ca.pem - Enforce SSL: set parameter
require_secure_transport=1 - PostgreSQL:
ssl=truein connection string - Oracle / SQL Server: native SSL
IAM DB Authentication
- Authenticate to RDS using IAM token (no password)
- Token generated via
generate-db-auth-tokenAPI - Valid for 15 minutes
- Supported: MySQL 5.7/8.0 and PostgreSQL 10+ only
- Not supported: Oracle, SQL Server, MariaDB
- No credentials stored in app code — use IAM role
- Good for: EC2, Lambda, ECS accessing RDS
Secrets Manager (Recommended)
- Store DB credentials in Secrets Manager
- Automatic rotation (every 30/60/90 days)
- Native RDS integration — rotates without downtime
- App reads secret via SDK — never hardcodes password
- Exam: “rotate DB credentials automatically” → Secrets Manager
Always-on: encryption at rest (enable at creation), SSL in transit, private subnet, SG allowing only app. Exam: “encrypt existing RDS” → snapshot + encrypted copy + restore. “Rotate DB credentials” → Secrets Manager.
- VPC + private subnet = RDS never exposed to internet; DB subnet group spans 2+ AZs
- Security Groups = restrict inbound to app SG only (never 0.0.0.0/0)
- KMS encryption at rest = must enable at creation; covers data, logs, snapshots, replicas
- SSL/TLS in transit = download RDS CA cert, enable in connection string
- IAM auth = token-based, no passwords; Secrets Manager = auto-rotating credentials
- Exam trap: can't enable encryption on existing DB — must snapshot → copy encrypted → restore
Architecture Patterns
Lambda functions can't maintain persistent DB connections — each invocation opens and closes a connection. At scale, this exhausts the RDS connection pool. RDS Proxy sits between Lambda and RDS, pooling connections and reusing them efficiently.
RDS Blue/Green Deployments create a synchronized staging environment (green) that mirrors production (blue). You test schema changes safely, then switch production traffic to green in seconds — with zero application downtime.
Blue (Production)
- Current production DB
- Live traffic serving users
- Changes tested here in green first
- Becomes old environment after switchover
Green (Staging)
- Synchronized copy of blue
- Apply schema changes & patches safely
- Test application against new schema
- Kept in sync via binlog replication
Switchover
- Single-click switchover (seconds)
- DNS flipped — prod now points to green
- Old blue retained for rollback
- No data loss, no application outage
🎯 Blue/Green Use Cases
- Major version upgrades (e.g., MySQL 5.7 → 8.0) with zero downtime
- Schema changes: adding columns, changing indexes
- Testing DB engine parameter changes safely before applying to production
- Exam: “zero-downtime major version upgrade” → RDS Blue/Green Deployments
| You Need... | Use | Why |
|---|---|---|
| Managed relational DB (MySQL / PG) | RDS | Patching, backups, Multi-AZ managed |
| Maximum relational performance | Aurora | 5× MySQL / 3× PG performance, auto-scales |
| Key-value / document store | DynamoDB | Serverless, single-digit ms, unlimited scale |
| In-memory caching (reduce DB load) | ElastiCache | Redis / Memcached, microsecond latency |
| Lambda + RDS (connection pooling) | RDS Proxy | Prevents connection exhaustion, IAM auth |
| Real-time analytics without ETL pipelines | Zero-ETL → Redshift | Near real-time RDS → Redshift, no pipelines needed |
| OS-level DB access (Oracle / SQL Server) | RDS Custom | Managed + SSH/filesystem access for legacy migrations |
| Full DB control on EC2 | EC2 + DB | Custom configs RDS doesn't support (rare) |
🎯 Exam Keywords → RDS Answer
- “automatic failover DB” → Multi-AZ (NOT read replica)
- “read-heavy, offload reads” → Read Replica
- “recover to specific time” → PITR (automated backups)
- “encrypt existing unencrypted RDS” → snapshot → copy with encryption → restore
- “Lambda + RDS connection exhaustion” → RDS Proxy
- “faster failover with Lambda/RDS” → RDS Proxy (cuts failover time ~66%)
- “rotate DB credentials automatically” → Secrets Manager
- “cross-region disaster recovery DB” → Cross-region read replica
- “DB not publicly accessible” → private subnet + security group
- “Multi-AZ standby queryable?” → NO (classic Multi-AZ) / YES (Multi-AZ Cluster)
- “zero-downtime major version upgrade” → Blue/Green Deployments
- “real-time RDS analytics, no ETL” → Zero-ETL integration → Redshift
- “OS-level access Oracle/SQL Server managed” → RDS Custom
- “identify slow queries / DB bottleneck” → Performance Insights
- “alert when DB failover / backup happens” → RDS Event Notifications + SNS
- “first queries slow after restore” → lazy S3 loading; use EBS fast snapshot restore
RDS is your production relational database foundation: Multi-AZ for HA, Read Replicas for scale, PITR for safety, Secrets Manager for credentials, RDS Proxy for serverless (66% faster failover). Use Blue/Green Deployments for zero-downtime upgrades, Performance Insights to diagnose slow queries, Zero-ETL for real-time analytics to Redshift, and RDS Custom for Oracle/SQL Server when you need OS-level access.