LearningTree · AWS · Analytics

Analytics Services —
From Data Lake to Dashboard

Athena, Glue, Kinesis, Redshift, QuickSight, OpenSearch — six approaches to storing, processing, querying, and visualising data on AWS. This page maps the full analytics landscape before you dive into each service.

Chapter One

What is Analytics on AWS?

AWS Analytics services let you collect, store, process, and visualise data at any scale — from ad-hoc SQL queries over files in S3, to real-time streaming pipelines, to sub-second enterprise BI dashboards over petabytes of structured data.

The modern AWS analytics stack separates concerns into distinct layers. Each layer has a dedicated service optimised for its job:

🏗️

The Analytics Stack (Layers)

Storage — S3 (data lake, raw files)
Catalog — Glue Data Catalog (metadata, schemas)
ETL — Glue ETL Jobs (transform, format conversion)
Streaming — Kinesis (real-time ingestion)
Query — Athena (serverless SQL) · Redshift (warehouse)
Visualise — QuickSight (dashboards)
Search — OpenSearch (logs, full-text)
Govern — Lake Formation (security, sharing)

⚡

Why Multiple Services?

No single tool does everything well
Athena = ad-hoc exploration (pay per query)
Redshift = fast, complex BI (always-on warehouse)
Kinesis = real-time streaming (sub-second)
OpenSearch = text search + log dashboards
Combine services for complete pipelines

Chapter Two

Services & Spectrum

Core Analytics Services

🔍

Query Engine

Amazon Athena

Serverless SQL directly on S3. Pay per query, no infrastructure. Ad-hoc exploration of data lake files (CSV, Parquet, JSON).

Data Catalog (shared metadata) + serverless ETL jobs. Crawlers auto-discover schemas. Convert CSV to Parquet. The metadata backbone.

Real-time data streaming. Data Streams for custom processing (~200ms). Firehose for managed delivery to S3/Redshift/OpenSearch.

Columnar data warehouse. Sub-second queries over billions of rows. MPP architecture. Enterprise BI, complex joins, high concurrency.

Deep dive →

📊

More Services

QuickSight · Lake Formation · OpenSearch

QuickSight for BI dashboards, Lake Formation for data lake governance, OpenSearch for search and log analytics.

Deep dive →

The Analytics Spectrum — Serverless to Managed

Each analytics service trades simplicity for performance and control. Athena is zero-ops serverless; Redshift gives maximum performance but requires cluster management.

← Simpler / ad-hoc Faster / enterprise →

Athena

Serverless SQL

Glue

Catalog + ETL

Kinesis

Real-time stream

Redshift

Data Warehouse

Service Comparison at a Glance

Service	Type	Data Location	Best For	Cost Model
Athena	Query engine	S3 (in-place)	Ad-hoc SQL, exploration	Per TB scanned
Glue Catalog	Metadata	Metadata only	Schema registry, shared catalog	Free (mostly)
Glue ETL	Transform	S3 → S3	CSV→Parquet, joins, cleaning	Per DPU-hour
Kinesis Data Streams	Streaming	Ordered shards	Real-time custom processing	Per shard-hour
Kinesis Firehose	Delivery	→ S3/Redshift/OS	Managed delivery to destinations	Per GB ingested
Redshift	Warehouse	Loaded into cluster	Sub-second BI, complex joins	Per node-hour or RPU
QuickSight	BI / Visualise	SPICE (in-memory)	Dashboards, reports	Per session
OpenSearch	Search + Logs	Indexed in cluster	Full-text search, log analytics	Per instance-hour
Lake Formation	Governance	Glue Catalog + S3	Column/row security, sharing	Free (with underlying services)

Chapter Three

Decision Guide

When to Use What

If You Need…	Use…	Why
Ad-hoc SQL on S3, no infrastructure	Athena	Serverless, pay per query, zero ops
Schema registry for data lake	Glue Data Catalog	Shared catalog for Athena/EMR/Redshift
Convert CSV to Parquet	Glue ETL	Serverless Spark, format conversion
Auto-discover schemas in S3	Glue Crawlers	Scan, classify, register in catalog
Real-time streaming ingestion	Kinesis Data Streams	Ordered shards, sub-200ms, replay
Deliver streams to S3 with no code	Kinesis Firehose	Managed, auto-batches, format conversion
Sub-second BI queries, high concurrency	Redshift	MPP warehouse, result caching
Query S3 from Redshift without loading	Redshift Spectrum	External tables on S3 via Glue Catalog
Interactive dashboards & reports	QuickSight	Serverless BI, SPICE, pay per session
Full-text search + log analytics	OpenSearch	Inverted index, Kibana dashboards
Column/row-level security on data lake	Lake Formation	GRANT/REVOKE, cross-account sharing

Chapter Four

Architecture Patterns

Common Production Patterns

🏗️

Pattern 1: Serverless Data Lake

S3 → Glue Catalog → Athena → QuickSight

Store raw data in S3
Glue crawls and catalogues
Athena queries with SQL
QuickSight visualises results

⚡

Pattern 2: Streaming Analytics

Kinesis → Firehose → S3 (Parquet) → Athena

Stream data in real-time
Firehose converts to Parquet
Lands in S3 partitioned by date
Athena/Redshift queries historical

🏢

Pattern 3: Enterprise BI

S3 → Glue ETL → Redshift → QuickSight

Glue ETL transforms and loads
Redshift stores hot data locally
Spectrum extends to cold S3 data
QuickSight for 100+ dashboard users

Chapter Five

Exam Insights

Decision Hints

If the question says…	Think…
"Serverless SQL" or "query S3 directly"	Athena
"Data catalog" or "metadata for data lake"	Glue Data Catalog
"Serverless ETL" or "convert CSV to Parquet"	Glue ETL
"Real-time" or "streaming data"	Kinesis Data Streams
"Deliver to S3 with no code" or "near-real-time"	Kinesis Firehose
"Data warehouse" or "sub-second BI" or "columnar"	Redshift
"Visualise" or "BI dashboards" or "SPICE"	QuickSight
"Full-text search" or "ELK" or "log analytics"	OpenSearch
"Column-level security" or "data lake governance"	Lake Formation
"Query S3 from Redshift without loading"	Redshift Spectrum

Common Exam Traps

Trap	Reality
"Athena replaces Redshift"	NO. Athena = ad-hoc, pay per query. Redshift = fast BI, high concurrency, complex joins. Different tools.
"Kinesis Firehose is real-time"	Firehose has a 60-second minimum buffer. It's NEAR-real-time. Data Streams = true real-time (~200ms).
"Glue is just ETL"	Glue has TWO halves: Data Catalog (metadata) AND ETL Jobs (transform). The catalog is used by Athena/EMR/Redshift.
"OpenSearch is a data warehouse"	OpenSearch is a search + log engine (inverted index). Redshift is the warehouse (columnar, SQL, structured).
"Lake Formation stores data"	Lake Formation stores nothing. It's a security/governance layer on top of Glue Catalog + S3.

Summary

📋 Analytics Services — Recap
S3 is the storage foundation — all analytics starts here.
Glue = metadata backbone (catalog) + serverless transforms (ETL). Used by every other analytics service.
Athena = serverless SQL on S3. Zero ops, pay per scan. Best for exploration and infrequent queries.
Kinesis = real-time streaming. Data Streams for custom processing. Firehose for managed delivery to S3/Redshift.
Redshift = columnar data warehouse for sub-second BI queries, complex joins, high concurrency.
QuickSight = serverless BI dashboards. SPICE cache. Pay per reader session.
OpenSearch = full-text search + log analytics. Not a warehouse — a search engine.
Lake Formation = governance layer. Column/row security, cross-account sharing, on top of Glue.

👉 Key Takeaway

The AWS analytics stack is layered: S3 stores → Glue catalogs → Athena/Redshift queries → QuickSight visualises. Add Kinesis for real-time, OpenSearch for search, Lake Formation for governance. Pick the simplest service that meets your latency and concurrency needs.