LearningTree Β· AWS Β· Analytics

Analytics Services β€”
From Data Lake to Dashboard

Athena, Glue, Kinesis, Redshift, QuickSight, OpenSearch β€” six approaches to storing, processing, querying, and visualising data on AWS. This page maps the full analytics landscape before you dive into each service.

01
Chapter One

What is Analytics on AWS?

AWS Analytics services let you collect, store, process, and visualise data at any scale β€” from ad-hoc SQL queries over files in S3, to real-time streaming pipelines, to sub-second enterprise BI dashboards over petabytes of structured data.

The modern AWS analytics stack separates concerns into distinct layers. Each layer has a dedicated service optimised for its job:

πŸ—οΈ

The Analytics Stack (Layers)

  • Storage β€” S3 (data lake, raw files)
  • Catalog β€” Glue Data Catalog (metadata, schemas)
  • ETL β€” Glue ETL Jobs (transform, format conversion)
  • Streaming β€” Kinesis (real-time ingestion)
  • Query β€” Athena (serverless SQL) Β· Redshift (warehouse)
  • Visualise β€” QuickSight (dashboards)
  • Search β€” OpenSearch (logs, full-text)
  • Govern β€” Lake Formation (security, sharing)
⚑

Why Multiple Services?

  • No single tool does everything well
  • Athena = ad-hoc exploration (pay per query)
  • Redshift = fast, complex BI (always-on warehouse)
  • Kinesis = real-time streaming (sub-second)
  • OpenSearch = text search + log dashboards
  • Combine services for complete pipelines
02
Chapter Two

Services & Spectrum

Core Analytics Services
The Analytics Spectrum β€” Serverless to Managed

Each analytics service trades simplicity for performance and control. Athena is zero-ops serverless; Redshift gives maximum performance but requires cluster management.

← Simpler / ad-hoc Faster / enterprise β†’
Athena
Serverless SQL
Glue
Catalog + ETL
Kinesis
Real-time stream
Redshift
Data Warehouse
Service Comparison at a Glance
ServiceTypeData LocationBest ForCost Model
AthenaQuery engineS3 (in-place)Ad-hoc SQL, explorationPer TB scanned
Glue CatalogMetadataMetadata onlySchema registry, shared catalogFree (mostly)
Glue ETLTransformS3 → S3CSV→Parquet, joins, cleaningPer DPU-hour
Kinesis Data StreamsStreamingOrdered shardsReal-time custom processingPer shard-hour
Kinesis FirehoseDelivery→ S3/Redshift/OSManaged delivery to destinationsPer GB ingested
RedshiftWarehouseLoaded into clusterSub-second BI, complex joinsPer node-hour or RPU
QuickSightBI / VisualiseSPICE (in-memory)Dashboards, reportsPer session
OpenSearchSearch + LogsIndexed in clusterFull-text search, log analyticsPer instance-hour
Lake FormationGovernanceGlue Catalog + S3Column/row security, sharingFree (with underlying services)
03
Chapter Three

Decision Guide

When to Use What
If You Need…Use…Why
Ad-hoc SQL on S3, no infrastructureAthenaServerless, pay per query, zero ops
Schema registry for data lakeGlue Data CatalogShared catalog for Athena/EMR/Redshift
Convert CSV to ParquetGlue ETLServerless Spark, format conversion
Auto-discover schemas in S3Glue CrawlersScan, classify, register in catalog
Real-time streaming ingestionKinesis Data StreamsOrdered shards, sub-200ms, replay
Deliver streams to S3 with no codeKinesis FirehoseManaged, auto-batches, format conversion
Sub-second BI queries, high concurrencyRedshiftMPP warehouse, result caching
Query S3 from Redshift without loadingRedshift SpectrumExternal tables on S3 via Glue Catalog
Interactive dashboards & reportsQuickSightServerless BI, SPICE, pay per session
Full-text search + log analyticsOpenSearchInverted index, Kibana dashboards
Column/row-level security on data lakeLake FormationGRANT/REVOKE, cross-account sharing
04
Chapter Four

Architecture Patterns

Common Production Patterns
πŸ—οΈ

Pattern 1: Serverless Data Lake

S3 β†’ Glue Catalog β†’ Athena β†’ QuickSight

  • Store raw data in S3
  • Glue crawls and catalogues
  • Athena queries with SQL
  • QuickSight visualises results
⚑

Pattern 2: Streaming Analytics

Kinesis β†’ Firehose β†’ S3 (Parquet) β†’ Athena

  • Stream data in real-time
  • Firehose converts to Parquet
  • Lands in S3 partitioned by date
  • Athena/Redshift queries historical
🏒

Pattern 3: Enterprise BI

S3 β†’ Glue ETL β†’ Redshift β†’ QuickSight

  • Glue ETL transforms and loads
  • Redshift stores hot data locally
  • Spectrum extends to cold S3 data
  • QuickSight for 100+ dashboard users
05
Chapter Five

Exam Insights

Decision Hints
If the question says…Think…
"Serverless SQL" or "query S3 directly"Athena
"Data catalog" or "metadata for data lake"Glue Data Catalog
"Serverless ETL" or "convert CSV to Parquet"Glue ETL
"Real-time" or "streaming data"Kinesis Data Streams
"Deliver to S3 with no code" or "near-real-time"Kinesis Firehose
"Data warehouse" or "sub-second BI" or "columnar"Redshift
"Visualise" or "BI dashboards" or "SPICE"QuickSight
"Full-text search" or "ELK" or "log analytics"OpenSearch
"Column-level security" or "data lake governance"Lake Formation
"Query S3 from Redshift without loading"Redshift Spectrum
Common Exam Traps
TrapReality
"Athena replaces Redshift"NO. Athena = ad-hoc, pay per query. Redshift = fast BI, high concurrency, complex joins. Different tools.
"Kinesis Firehose is real-time"Firehose has a 60-second minimum buffer. It's NEAR-real-time. Data Streams = true real-time (~200ms).
"Glue is just ETL"Glue has TWO halves: Data Catalog (metadata) AND ETL Jobs (transform). The catalog is used by Athena/EMR/Redshift.
"OpenSearch is a data warehouse"OpenSearch is a search + log engine (inverted index). Redshift is the warehouse (columnar, SQL, structured).
"Lake Formation stores data"Lake Formation stores nothing. It's a security/governance layer on top of Glue Catalog + S3.
Summary
πŸ“‹ Analytics Services β€” Recap
  • S3 is the storage foundation β€” all analytics starts here.
  • Glue = metadata backbone (catalog) + serverless transforms (ETL). Used by every other analytics service.
  • Athena = serverless SQL on S3. Zero ops, pay per scan. Best for exploration and infrequent queries.
  • Kinesis = real-time streaming. Data Streams for custom processing. Firehose for managed delivery to S3/Redshift.
  • Redshift = columnar data warehouse for sub-second BI queries, complex joins, high concurrency.
  • QuickSight = serverless BI dashboards. SPICE cache. Pay per reader session.
  • OpenSearch = full-text search + log analytics. Not a warehouse β€” a search engine.
  • Lake Formation = governance layer. Column/row security, cross-account sharing, on top of Glue.
πŸ‘‰ Key Takeaway

The AWS analytics stack is layered: S3 stores β†’ Glue catalogs β†’ Athena/Redshift queries β†’ QuickSight visualises. Add Kinesis for real-time, OpenSearch for search, Lake Formation for governance. Pick the simplest service that meets your latency and concurrency needs.