Analytics Services β
From Data Lake to Dashboard
Athena, Glue, Kinesis, Redshift, QuickSight, OpenSearch β six approaches to storing, processing, querying, and visualising data on AWS. This page maps the full analytics landscape before you dive into each service.
What is Analytics on AWS?
AWS Analytics services let you collect, store, process, and visualise data at any scale β from ad-hoc SQL queries over files in S3, to real-time streaming pipelines, to sub-second enterprise BI dashboards over petabytes of structured data.
The modern AWS analytics stack separates concerns into distinct layers. Each layer has a dedicated service optimised for its job:
The Analytics Stack (Layers)
- Storage β S3 (data lake, raw files)
- Catalog β Glue Data Catalog (metadata, schemas)
- ETL β Glue ETL Jobs (transform, format conversion)
- Streaming β Kinesis (real-time ingestion)
- Query β Athena (serverless SQL) Β· Redshift (warehouse)
- Visualise β QuickSight (dashboards)
- Search β OpenSearch (logs, full-text)
- Govern β Lake Formation (security, sharing)
Why Multiple Services?
- No single tool does everything well
- Athena = ad-hoc exploration (pay per query)
- Redshift = fast, complex BI (always-on warehouse)
- Kinesis = real-time streaming (sub-second)
- OpenSearch = text search + log dashboards
- Combine services for complete pipelines
Services & Spectrum
Each analytics service trades simplicity for performance and control. Athena is zero-ops serverless; Redshift gives maximum performance but requires cluster management.
| Service | Type | Data Location | Best For | Cost Model |
|---|---|---|---|---|
| Athena | Query engine | S3 (in-place) | Ad-hoc SQL, exploration | Per TB scanned |
| Glue Catalog | Metadata | Metadata only | Schema registry, shared catalog | Free (mostly) |
| Glue ETL | Transform | S3 β S3 | CSVβParquet, joins, cleaning | Per DPU-hour |
| Kinesis Data Streams | Streaming | Ordered shards | Real-time custom processing | Per shard-hour |
| Kinesis Firehose | Delivery | β S3/Redshift/OS | Managed delivery to destinations | Per GB ingested |
| Redshift | Warehouse | Loaded into cluster | Sub-second BI, complex joins | Per node-hour or RPU |
| QuickSight | BI / Visualise | SPICE (in-memory) | Dashboards, reports | Per session |
| OpenSearch | Search + Logs | Indexed in cluster | Full-text search, log analytics | Per instance-hour |
| Lake Formation | Governance | Glue Catalog + S3 | Column/row security, sharing | Free (with underlying services) |
Decision Guide
| If You Need⦠| Use⦠| Why |
|---|---|---|
| Ad-hoc SQL on S3, no infrastructure | Athena | Serverless, pay per query, zero ops |
| Schema registry for data lake | Glue Data Catalog | Shared catalog for Athena/EMR/Redshift |
| Convert CSV to Parquet | Glue ETL | Serverless Spark, format conversion |
| Auto-discover schemas in S3 | Glue Crawlers | Scan, classify, register in catalog |
| Real-time streaming ingestion | Kinesis Data Streams | Ordered shards, sub-200ms, replay |
| Deliver streams to S3 with no code | Kinesis Firehose | Managed, auto-batches, format conversion |
| Sub-second BI queries, high concurrency | Redshift | MPP warehouse, result caching |
| Query S3 from Redshift without loading | Redshift Spectrum | External tables on S3 via Glue Catalog |
| Interactive dashboards & reports | QuickSight | Serverless BI, SPICE, pay per session |
| Full-text search + log analytics | OpenSearch | Inverted index, Kibana dashboards |
| Column/row-level security on data lake | Lake Formation | GRANT/REVOKE, cross-account sharing |
Architecture Patterns
Pattern 1: Serverless Data Lake
S3 β Glue Catalog β Athena β QuickSight
- Store raw data in S3
- Glue crawls and catalogues
- Athena queries with SQL
- QuickSight visualises results
Pattern 2: Streaming Analytics
Kinesis β Firehose β S3 (Parquet) β Athena
- Stream data in real-time
- Firehose converts to Parquet
- Lands in S3 partitioned by date
- Athena/Redshift queries historical
Pattern 3: Enterprise BI
S3 β Glue ETL β Redshift β QuickSight
- Glue ETL transforms and loads
- Redshift stores hot data locally
- Spectrum extends to cold S3 data
- QuickSight for 100+ dashboard users
Exam Insights
| If the question says⦠| Think⦠|
|---|---|
| "Serverless SQL" or "query S3 directly" | Athena |
| "Data catalog" or "metadata for data lake" | Glue Data Catalog |
| "Serverless ETL" or "convert CSV to Parquet" | Glue ETL |
| "Real-time" or "streaming data" | Kinesis Data Streams |
| "Deliver to S3 with no code" or "near-real-time" | Kinesis Firehose |
| "Data warehouse" or "sub-second BI" or "columnar" | Redshift |
| "Visualise" or "BI dashboards" or "SPICE" | QuickSight |
| "Full-text search" or "ELK" or "log analytics" | OpenSearch |
| "Column-level security" or "data lake governance" | Lake Formation |
| "Query S3 from Redshift without loading" | Redshift Spectrum |
| Trap | Reality |
|---|---|
| "Athena replaces Redshift" | NO. Athena = ad-hoc, pay per query. Redshift = fast BI, high concurrency, complex joins. Different tools. |
| "Kinesis Firehose is real-time" | Firehose has a 60-second minimum buffer. It's NEAR-real-time. Data Streams = true real-time (~200ms). |
| "Glue is just ETL" | Glue has TWO halves: Data Catalog (metadata) AND ETL Jobs (transform). The catalog is used by Athena/EMR/Redshift. |
| "OpenSearch is a data warehouse" | OpenSearch is a search + log engine (inverted index). Redshift is the warehouse (columnar, SQL, structured). |
| "Lake Formation stores data" | Lake Formation stores nothing. It's a security/governance layer on top of Glue Catalog + S3. |
- S3 is the storage foundation β all analytics starts here.
- Glue = metadata backbone (catalog) + serverless transforms (ETL). Used by every other analytics service.
- Athena = serverless SQL on S3. Zero ops, pay per scan. Best for exploration and infrequent queries.
- Kinesis = real-time streaming. Data Streams for custom processing. Firehose for managed delivery to S3/Redshift.
- Redshift = columnar data warehouse for sub-second BI queries, complex joins, high concurrency.
- QuickSight = serverless BI dashboards. SPICE cache. Pay per reader session.
- OpenSearch = full-text search + log analytics. Not a warehouse β a search engine.
- Lake Formation = governance layer. Column/row security, cross-account sharing, on top of Glue.
The AWS analytics stack is layered: S3 stores β Glue catalogs β Athena/Redshift queries β QuickSight visualises. Add Kinesis for real-time, OpenSearch for search, Lake Formation for governance. Pick the simplest service that meets your latency and concurrency needs.