LearningTree · AWS · Database

Other AWS Database Services —
The Right Database for Every Problem

AWS has a database for every data model. Beyond RDS, Aurora, DynamoDB, and ElastiCache, there is a purpose-built service for graphs, analytics, time-series data, immutable ledgers, JSON documents, and wide-column stores. Learning when to choose each is the real skill.

🗺️ The AWS Database Universe

Type Service Best For
RelationalRDS / AuroraSQL, ACID, OLTP
NoSQL / Key-ValueDynamoDBServerless, ms latency, any scale
In-Memory CacheElastiCacheSub-ms reads, reduce DB load
GraphNeptuneRelationships, social, fraud
Analytics / OLAPRedshiftData warehouse, petabyte analytics
Time-SeriesTimestreamIoT, metrics, time-based data
LedgerQLDBImmutable audit trail, financial records
DocumentDocumentDBJSON, MongoDB-compatible
Wide-ColumnKeyspacesCassandra-compatible, wide-column
In-Memory (Durable)MemoryDB for RedisRedis speed + full durability
Relational + OS AccessRDS CustomOracle/SQL Server with SSH access
Relational (On-Prem)RDS on VMwareHybrid cloud, data residency
01
Graph Database

Amazon Neptune — Graph Database

What is Neptune & Why Graphs Introductory

Traditional databases store data in tables or documents. But some problems are fundamentally about relationships: who is connected to whom, how are entities related, what path connects A to B? Amazon Neptune is a fully managed graph database that makes relationship queries fast and natural.

👉 Mental model: Think of Neptune as a database made of nodes (things) and edges (relationships). Instead of a table of users and a join to friends, you simply traverse connections. “Find all friends of friends who live in NYC” is a single graph traversal — not a multi-join SQL query.

🧠

Core Concept

  • Nodes: Alice, Bob, Product, Post
  • Edges: FRIEND_OF, BOUGHT, LIKES
  • Properties: attributes on nodes/edges
  • Queries traverse relationships, not rows
  • Optimized for highly connected data

Use Cases

  • Social networks (friends, followers)
  • Fraud detection (unusual connections)
  • Recommendation engines
  • Knowledge graphs
  • Network topology, IT infrastructure
📌

Query Languages

  • Gremlin: property graph traversal
  • SPARQL: RDF / semantic web
  • openCypher: graph pattern matching
  • Neptune supports all three
  • Pick based on data model
Graph model — nodes and edges represent entities and relationships
Alice :Person Bob :Person Carol :Person AWS Book :Product FRIEND_OF FRIEND_OF BOUGHT GRAPH QUERY Find Bob's purchases Alice → FRIEND_OF → Bob → BOUGHT → ? No JOINs needed

Choose Neptune When

  • Data is fundamentally about relationships
  • Multi-hop traversals (friend of friend)
  • Fraud detection (unusual connection patterns)
  • Recommendation: “users who bought X also bought Y”
  • Knowledge graphs, ontologies
📌

Don't Use Neptune When

  • Data is tabular — use RDS/Aurora
  • Simple key-value lookups — DynamoDB
  • Relationships are shallow (SQL JOIN is fine)
  • Analytics/reporting — Redshift
Neptune vs RDS & Global Database Core

Neptune vs RDS for Relationships

  • SQL can do friend-of-friend with multiple JOINs
  • SQL complexity: O(nยณ) for 3-hop traversals
  • Neptune traversal: O(n) regardless of depth
  • For deep relationship queries, graphs are exponentially faster
  • Use RDS when relationships are shallow (1โ€“2 JOINs)
🌐

Neptune Global Database

  • Replicate graph data across multiple regions
  • Replication lag <1 second
  • Active-active reads across regions
  • Single writer region (primary)
  • Use for global social / recommendation apps
🧠 Key Insight

Neptune is for relationship-first data. When your most important queries are about how things are connected, Neptune is the right choice. Exam: “social network”, “fraud detection”, “recommendation engine” → Neptune.

02
Data Warehouse / Analytics

Amazon Redshift — Data Warehouse

OLTP vs OLAP — The Critical Distinction Introductory

OLTP (Online Transaction Processing) = what RDS does: fast, small, frequent transactions. OLAP (Online Analytical Processing) = what Redshift does: complex analytical queries across massive datasets. Never run OLTP on Redshift or analytics on RDS.

Aspect OLTP (RDS/Aurora) OLAP (Redshift)
QueriesSimple, fast (ms)Complex, slow (seconds–min)
Data volumeGB–TBTB–PB
OperationsINSERT/UPDATE/DELETESELECT, GROUP BY, SUM, AVG
StorageRow-basedColumnar (faster aggregations)
📊

What is Redshift

  • Fully managed data warehouse
  • Columnar storage for fast aggregations
  • Massively parallel processing (MPP)
  • Petabyte-scale analytics
  • SQL interface (PostgreSQL-compatible)

Use Cases

  • Business intelligence dashboards
  • Sales / revenue analytics
  • Log analysis at scale
  • ETL pipeline destination
  • RDS → Redshift (Zero-ETL)

Key Features

  • Redshift Serverless: no cluster to manage
  • Spectrum: query S3 data directly
  • Zero-ETL: near real-time from RDS/Aurora
  • ML: train models with SQL
  • Up to 16 PB storage
Redshift in a typical analytics pipeline
RDS
RDS / Aurora
OLTP source
Zero-ETL
Redshift
Redshift
Data warehouse
Columnar, TB-PB
BI
BI Tools
QuickSight
Tableau, Looker
Distribution Styles & Sort Keys Core
📦

Distribution Styles

  • AUTO: Redshift chooses automatically (default)
  • EVEN: Round-robin across nodes — large tables without clear join key
  • KEY: Same key value → same node — optimizes joins on that column
  • ALL: Full copy on every node — small dimension tables
  • 🎯 Exam: “optimize join performance” → KEY distribution on join column
📊

Sort Keys & Concurrency Scaling

  • Sort Key: data physically sorted on disk
  • Put frequently-filtered columns first (e.g., date, region)
  • Compound: columns in defined order — most common
  • Interleaved: equal weight per column — rare, high-maintenance
  • Concurrency Scaling: auto-adds transient clusters during query spikes; pay per second used
🧠 Key Insight

Redshift is for analytics, not transactions. Exam: “data warehouse”, “petabyte analytics”, “business intelligence”, “OLAP” → Redshift. RDS → Zero-ETL → Redshift for near-real-time analytics without pipelines.

03
Time-Series Database

Amazon Timestream — Time-Series Database

What is Time-Series Data Introductory

Time-series data is measurements recorded at regular intervals over time — IoT sensor readings, CPU metrics, stock prices. Key pattern: data is always appended (never updated), queries are always time-ranged, and recent data is accessed far more than old data.

⏱️

What is Timestream

  • Fully managed time-series database
  • Serverless — scales automatically
  • Purpose-built for timestamped data
  • 10× faster and 1/10 cost vs relational
  • Built-in time-series SQL functions

Use Cases

  • IoT sensor data (temp, pressure)
  • DevOps metrics (CPU, memory, latency)
  • Application performance monitoring
  • Financial data (tick data, prices)
  • Industrial equipment monitoring
💡

Key Features

  • Tiered storage: hot (memory) → warm (SSD) auto-tier
  • SQL-like queries with time functions
  • Retention policy: auto-expire old data
  • Integrates with Grafana, QuickSight
  • IoT Core / Kinesis integration
Timestream storage — recent data in memory, older data auto-tiered to SSD and S3
time → HOT (In-Memory) Last hours • ~1ms queries WARM (SSD) Days–Months • Fast queries COLD (S3) Months–Years • Auto-archived → auto-tier → auto-tier
Memory Store, Magnetic Store & Scheduled Queries Core
🧲

Memory vs Magnetic Store

  • Memory store: recent data, high-throughput writes, low-latency reads (~1ms)
  • Magnetic store: older data, lower cost, slower queries
  • Configure retention policy to move data automatically
  • Memory retention: hours to days (configurable)
  • Magnetic retention: days to years (configurable)

Scheduled Queries

  • Run aggregations (hourly, daily, weekly) automatically
  • Write results to a new derived table
  • Use for downsampling high-frequency IoT data
  • Example: 1-sec sensor reads → hourly averages
  • Reduces query cost and improves dashboard speed
🧠 Key Insight

Timestream is for append-heavy, time-ranged data. Recent = hot, old = cold. The auto-tiering matches the natural access pattern perfectly. Exam: “IoT sensor data”, “metrics”, “time-series” → Timestream.

04
Ledger Database

Amazon QLDB — Ledger Database

What is an Immutable Ledger Introductory

Amazon QLDB (Quantum Ledger Database) is a fully managed ledger database that provides a transparent, immutable, cryptographically verifiable transaction log. Every change is permanently recorded — nothing can be deleted or altered, and you can prove it mathematically.

👉 Mental model: QLDB is a traditional database where every row has a full history and that history is cryptographically proven. Show a regulator not just what the data is today, but every state it's ever been in — and prove it hasn't been tampered with.

📜

Core Features

  • Immutable journal: nothing can be deleted
  • Cryptographic hashes: SHA-256 chain
  • Full history: every version of every record
  • PartiQL (SQL-like) query language
  • Serverless, fully managed

Use Cases

  • Financial ledgers (debit/credit history)
  • Supply chain tracking
  • Regulatory audit trails
  • Medical records history
  • Insurance claims processing
💡

QLDB vs Blockchain

  • QLDB: centralized (AWS-managed)
  • Blockchain: decentralized (no single owner)
  • QLDB: faster, simpler, single-owner trust
  • Use QLDB when: you own and trust the data
  • Exam: “immutable, audit trail” → QLDB
QLDB journal — every change appended, cryptographically chained, cannot be altered
Block 1 Balance: $1000 Hash: a1b2c3 Prev: GENESIS Block 2 Transfer: -$200 Hash: d4e5f6 Prev: a1b2c3 Block 3 Balance: $800 Hash: g7h8i9 Prev: d4e5f6 🔒 Immutable Alter block 2? Hash chain breaks! ✔ Cryptographically proven history
QLDB vs Managed Blockchain — Exam Trap Core
Feature QLDB Managed Blockchain
Trust modelCentralized (AWS-managed)Decentralized (multiple parties)
ImmutabilityCryptographic hash verificationConsensus across nodes
OwnershipSingle owner controls the ledgerNo single owner โ€” trustless
PerformanceFaster, simplerSlower (consensus overhead)
🎯 Exam trigger“audit trail, ledger, immutable history”“blockchain, multiple parties, trustless”
🧠 Key Insight

QLDB is for when you need proof that data hasn't been tampered with. Exam: “immutable audit trail”, “financial ledger”, “verify data history” → QLDB. “Decentralized blockchain” → Amazon Managed Blockchain (not QLDB).

05
Document Database

Amazon DocumentDB — Document Database

What is DocumentDB Introductory

Amazon DocumentDB is a fully managed document database that is MongoDB-compatible. It stores data as JSON-like documents with flexible schemas where each document can have different fields. It uses Aurora's shared distributed storage model under the hood.

📄

Document Model

  • Data stored as JSON documents
  • Each document can have different schema
  • Rich query language (filter, project, aggregate)
  • Collections = tables; Documents = rows
  • Documents nested up to 100 levels

Use Cases

  • Content management systems
  • User profiles (varied attributes)
  • Product catalogs (different specs)
  • Mobile app backends
  • MongoDB → AWS migration

Why DocumentDB

  • MongoDB-compatible: minimal code changes
  • Storage auto-grows to 64 TiB
  • 6 copies across 3 AZs (Aurora-style)
  • Fully managed, no Mongo ops
  • Up to 15 read replicas
Aspect DocumentDB DynamoDB
ModelJSON documents, rich queriesKey-value, access-pattern queries
Query flexibilityHigher (MongoDB query language)Key-based only
ScaleVery largeUnlimited (serverless)
CompatibleMongoDB drivers/appsDynamoDB SDK only

👉 Important: DocumentDB is wire-protocol compatible with MongoDB — your existing MongoDB drivers and applications work without code changes. However, it is not a fork of MongoDB open-source code. Under the hood it uses Aurora's distributed storage engine, meaning you get Aurora-class durability (6 copies across 3 AZs) with MongoDB API compatibility.

🧠 Key Insight

DocumentDB is for JSON-document workloads needing MongoDB compatibility or richer queries than DynamoDB. Exam: “MongoDB-compatible”, “JSON documents”, “flexible schema” → DocumentDB. For unlimited serverless scale, DynamoDB is better.

06
Wide-Column Database

Amazon Keyspaces — Wide-Column Database

What is Keyspaces Introductory

Amazon Keyspaces is a fully managed, serverless wide-column database compatible with Apache Cassandra. Run Cassandra workloads on AWS without managing Cassandra clusters — automatic scaling, no capacity planning, pay per use.

🗂️

Wide-Column Model

  • Rows identified by partition key
  • Each row can have different columns
  • Optimized for write-heavy workloads
  • CQL (Cassandra Query Language)
  • Designed for high throughput at scale

Use Cases

  • Industrial equipment data
  • High-velocity write workloads
  • Cassandra → AWS migration
  • Time-series at massive scale
  • Event logging, clickstreams

Why Keyspaces

  • Cassandra-compatible: drop-in replacement
  • Serverless, no Cassandra ops
  • Auto-scales read/write capacity
  • Single-digit ms latency
  • Multi-Region replication
Keyspaces Capacity Modes Core
💳

On-Demand (Pay per Request)

  • Pay only for reads/writes you use
  • No capacity planning required
  • Best for unpredictable or spiky workloads
  • Scales instantly to any traffic level
📏

Provisioned (RCU / WCU)

  • Set read/write capacity units upfront
  • Lower cost for predictable, steady workloads
  • Auto-scaling adjusts capacity automatically
  • Similar model to DynamoDB provisioned mode
🧠 Key Insight

Keyspaces is for Cassandra workloads on AWS without managing Cassandra clusters. Exam: “Cassandra-compatible”, “wide-column”, “migrate Cassandra” → Keyspaces. For general-purpose NoSQL at massive scale, DynamoDB is usually the better choice.

07
In-Memory Database (Durable)

Amazon MemoryDB for Redis

Redis Speed + Database Durability Introductory

Amazon MemoryDB for Redis is a fully managed, Redis-compatible, durable in-memory database. Unlike ElastiCache (a cache where data loss is acceptable on failure), MemoryDB persists every write to a Multi-AZ transaction log before acknowledging it โ€” giving you Redis speed with true database durability.

👉 Mental model: MemoryDB = “Redis with durability”. ElastiCache is a cache in front of your database. MemoryDB is the database — built for real-time applications that need microsecond reads and writes and cannot afford data loss.

Feature ElastiCache for Redis MemoryDB for Redis
PurposeCache (sits in front of DB)Primary database
DurabilityOptional snapshots onlyAlways โ€” Multi-AZ transaction log
Data loss on failoverPossibleNone
Use caseSession cache, leaderboardsReal-time apps, persistent microservice state
🧠 Key Insight

Exam: “durable Redis”, “persistent in-memory”, “Redis speed + no data loss” → MemoryDB. If the question says “cache”, choose ElastiCache. If it says “primary database” or “durability” with Redis → MemoryDB.

08
Managed Relational + OS Access

Amazon RDS Custom

RDS with OS-Level Access Introductory

Amazon RDS Custom gives you the automation of RDS (backups, monitoring, patching) while also allowing SSH / OS-level access to the underlying EC2 instance. Use it when you need to install third-party agents, apply custom patches, or satisfy compliance tools that require direct OS access.

🔧

What is RDS Custom

  • RDS automation + SSH access to EC2
  • Supports Oracle and SQL Server only
  • Install custom patches, OS-level agents
  • AWS still manages backups & monitoring
  • You own OS configuration

Choose RDS Custom When

  • Oracle or SQL Server workload
  • Compliance requires OS-level monitoring agent
  • Custom OS patches / third-party tools
  • Legacy enterprise apps with OS dependencies

Don't Use When

  • Standard RDS meets your needs (use RDS โ€” cheaper)
  • MySQL / PostgreSQL workload (not supported)
  • You don't need OS access (adds management overhead)
🧠 Key Insight

Exam: “Oracle/SQL Server + OS-level access”, “custom OS patches on RDS”, “third-party agent on RDS host” → RDS Custom. For everything else, use standard RDS.

09
Hybrid / On-Premises Relational

Amazon RDS on VMware

RDS in Your Own Data Centre Introductory

Amazon RDS on VMware lets you deploy RDS-managed databases in your on-premises VMware environment using the same RDS APIs and console you use in the cloud. It bridges hybrid architectures โ€” manage on-prem databases the same way you manage cloud RDS.

🏢

What & Why

  • RDS deployed inside your VMware infrastructure on-prem
  • Same RDS console, APIs, and automation
  • Automated backups, patching, monitoring on-prem
  • Supports MySQL, PostgreSQL, SQL Server, Oracle
  • Good for data residency requirements

Use Cases

  • Hybrid cloud โ€” on-prem + AWS
  • Regulatory / data residency (data must stay on-prem)
  • Gradual migration to AWS RDS
  • Consistent tooling across cloud and on-prem
🧠 Key Insight

RDS on VMware is rarely a primary exam topic but appears in hybrid architecture questions. Exam: “RDS in on-premises VMware”, “data residency, manage on-prem databases with RDS APIs” → RDS on VMware.

10
Decision Guide

Decision Guide — Choosing the Right AWS Database

The Full AWS Database Decision Table Core
Requirement / Keyword Choose Reason
SQL, ACID, relationalRDS / AuroraTraditional relational workloads
Serverless NoSQL, ms latency, any scaleDynamoDBUnlimited scale, access-pattern design
Sub-ms caching, session storageElastiCacheRedis or Memcached, in-memory
Social network, fraud detection, relationshipsNeptuneGraph traversal, nodes + edges
Analytics, OLAP, data warehouse, BIRedshiftColumnar, petabyte-scale analytics
IoT, metrics, time-series dataTimestreamPurpose-built for time-stamped data
Immutable audit trail, financial ledgerQLDBCryptographically verifiable history
MongoDB-compatible, JSON documentsDocumentDBFlexible JSON schema, rich queries
Cassandra-compatible, wide-columnKeyspacesManaged Cassandra, high-write throughput
Durable Redis, persistent in-memory, Redis + no data lossMemoryDB for RedisRedis speed + Multi-AZ transaction log persistence
Oracle/SQL Server + OS-level access, custom patchesRDS CustomRDS automation + SSH to EC2 host
RDS on-premises, VMware, data residencyRDS on VMwareHybrid cloud, on-prem RDS management
Exam Cheatsheet — Keywords to Service Core

🎯 Exam Keywords → Service Answer

  • “social network, relationships, graph traversal” → Neptune
  • “fraud detection, recommendation engine” → Neptune
  • “deep multi-hop traversals, friend-of-friend” → Neptune (SQL JOINs don't scale)
  • “global graph, multi-region graph replication” → Neptune Global Database
  • “data warehouse, OLAP, petabyte analytics, BI” → Redshift
  • “RDS to analytics without ETL pipeline” → Redshift Zero-ETL
  • “query S3 data with SQL” → Redshift Spectrum
  • “optimize Redshift join performance” → KEY distribution style on join column
  • “Redshift handle query spike, many concurrent BI users” → Concurrency Scaling
  • “IoT sensor, time-series, metrics ingestion” → Timestream
  • “downsample IoT data, hourly aggregates” → Timestream Scheduled Queries
  • “immutable audit trail, ledger, history verification” → QLDB
  • “decentralized blockchain” → Amazon Managed Blockchain (NOT QLDB)
  • “MongoDB-compatible, JSON documents” → DocumentDB
  • “migrate MongoDB to AWS” → DocumentDB (wire-protocol compatible)
  • “Cassandra-compatible, wide-column” → Keyspaces
  • “migrate Cassandra to AWS” → Keyspaces
  • “durable Redis, persistent in-memory, Redis + durability” → MemoryDB for Redis
  • “Redis as primary database, no data loss on failover” → MemoryDB (not ElastiCache)
  • “Oracle/SQL Server + OS access, custom OS patch, SSH to RDS” → RDS Custom
  • “RDS on-premises, VMware, manage on-prem databases like RDS” → RDS on VMware
🧠 Final Insight

The real AWS database skill is not learning every feature — it is choosing the right tool for the problem. Know the one-line definition and exam keyword for each service. Most common traps: OLAP ≠ OLTP (Redshift vs RDS); QLDB ≠ Managed Blockchain; Neptune for relationships, not just “big data”.