LearningTree · AWS · Analytics

Other Analytics Services —
QuickSight · Lake Formation · OpenSearch

Three services that complete the analytics ecosystem: QuickSight visualises data, Lake Formation governs the data lake, and OpenSearch powers search and log analytics.

Chapter One · Analytics

Amazon QuickSight — Serverless BI Dashboards

Amazon QuickSight is a serverless, cloud-native business intelligence (BI) service that lets you create interactive dashboards, reports, and visualisations — embedded in applications or shared with users. It connects to virtually every AWS analytics service and scales to thousands of users without managing infrastructure.

What QuickSight Does Introductory

📊

Interactive Dashboards

Drag-and-drop visual builder
Charts, tables, KPIs, maps, pivot tables
Drill-down, filter, parameter controls
Auto-refresh on schedule

🔗

Data Sources

Athena, Redshift, RDS, Aurora
S3 (CSV, JSON, Parquet)
OpenSearch, Timestream
On-prem via Direct Connect
SaaS: Salesforce, Jira, etc.

🤖

ML Insights (Q)

Natural language queries ("What were sales last month?")
Anomaly detection built-in
Forecasting with ML
Auto-narratives (text summaries)

SPICE — In-Memory Engine Core

SPICE (Super-fast, Parallel, In-memory Calculation Engine) is QuickSight's in-memory cache. When you import data into SPICE:

Data is loaded into QuickSight's own storage (up to 500M rows per dataset)
Dashboards query SPICE directly — not the source database
Sub-second response regardless of source query speed
Scheduled refresh keeps SPICE in sync (hourly/daily)
Alternative: "Direct Query" mode hits the source live (slower but real-time)

QuickSight Pricing Core

Edition	Cost	Key Difference
Standard	$9/author/month	Basic dashboards, limited sharing
Enterprise	$18/author, $0.30/reader-session	Row-level security, embedding, Q (NLQ), ML insights
SPICE capacity	$0.25/GB/month (included 10GB/user)	In-memory cache for fast dashboards
Reader sessions	$0.30/session (30-min), max $5/reader/month	Pay only when viewers open dashboards

🧠 Key Distinction

QuickSight charges per reader session (not per user license) — meaning a viewer who opens the dashboard once a month costs $0.30. This makes it dramatically cheaper than traditional BI tools (Tableau: $70/user/month) for large viewer audiences.

🎯 Exam Insight

"Serverless BI dashboards" or "visualise data from Athena/Redshift" → QuickSight
"SPICE" = QuickSight's in-memory engine for fast dashboard rendering
"Embed dashboards in application" → QuickSight Enterprise (embedded analytics)
"Pay per reader session, not license" → QuickSight pricing model
"Natural language queries on data" → QuickSight Q
"ML-powered anomaly detection in dashboards" → QuickSight ML Insights

Chapter 01 — Key Takeaway

QuickSight is the serverless BI layer of the AWS analytics stack. It connects to Athena, Redshift, and S3, caches data in SPICE for sub-second dashboards, and charges per reader session ($0.30) — making it cost-effective for thousands of viewers. Use it as the visualisation endpoint for any analytics pipeline.

Chapter Two · Analytics

AWS Lake Formation — Data Lake Governance

AWS Lake Formation is a governance and security layer built on top of the Glue Data Catalog. It simplifies building, securing, and managing data lakes — providing fine-grained access control (column-level, row-level, cell-level) that Glue alone doesn't offer.

What Lake Formation Adds Beyond Glue Core

Capability	Glue Alone	With Lake Formation
Table-level access control	IAM policies on Glue Catalog	✅ Centralised GRANT/REVOKE (SQL-like)
Column-level security	❌ Not supported	✅ Grant access to specific columns only
Row-level security	❌ Not supported	✅ Filter rows by user attributes
Cell-level security	❌ Not supported	✅ Column + row intersection filtering
Cross-account sharing	Complex IAM + resource policies	✅ Simple GRANT to external account
Data location registration	Manual S3 permissions	✅ Register S3 locations; LF manages access
Governed tables	❌ No ACID on S3	✅ ACID transactions on S3 data lake tables

How Lake Formation Works Core

Lake Formation — Governance Layer Architecture

Lake Formation sits between users and the data lake — enforcing column/row security on every query

Common Lake Formation Use Cases Core

🔒

Fine-Grained Access

Marketing team sees only their columns (no PII)
Finance sees all columns but only their region's rows
Data engineers have full access
All enforced centrally — no S3 bucket policies needed

🤝

Cross-Account Data Sharing

Share specific tables with another AWS account
Consumer account queries via Athena or Redshift Spectrum
No data copying — shared via catalog
Revoke access instantly

🎯 Exam Insight

"Column-level security on data lake" → Lake Formation
"Row-level filtering in Athena" → Lake Formation data filters
"Centralised GRANT/REVOKE for data lake" → Lake Formation (not raw IAM)
"Share data lake tables across accounts" → Lake Formation cross-account sharing
"ACID transactions on S3" → Lake Formation governed tables
"Lake Formation vs Glue" → Glue = metadata + ETL. Lake Formation = security + governance on top of Glue.

Chapter 02 — Key Takeaway

Lake Formation is the governance layer for your data lake. It adds what Glue Catalog lacks: column-level security, row-level filtering, cross-account sharing, and centralised GRANT/REVOKE — all without complex IAM policies. Use it when you need fine-grained access control over who sees what data in your lake.

Chapter Three · Analytics

Amazon OpenSearch Service — Search & Log Analytics

Amazon OpenSearch Service (successor to Elasticsearch Service) is a managed search and analytics engine. It excels at full-text search, log analytics, real-time monitoring, and observability dashboards — use cases where Redshift and Athena are the wrong tool.

What OpenSearch Does Best Core

🔎

Full-Text Search

Inverted index for fast text search
Fuzzy matching, autocomplete
Relevance scoring (BM25)
E-commerce product search

📋

Log Analytics

Centralised log aggregation
CloudWatch Logs → OpenSearch
VPC Flow Logs analysis
Application error investigation

📊

Observability

OpenSearch Dashboards (Kibana fork)
Real-time metrics visualisation
Alerting on patterns/thresholds
Trace analytics (distributed tracing)

OpenSearch Architecture Core

Component	What It Is	Notes
Domain	A managed OpenSearch cluster	Equivalent to a "cluster" — contains nodes
Data nodes	Store data + execute queries	Scale by adding nodes or upgrading instance type
Master nodes	Manage cluster state (3 dedicated recommended)	Don't store data; ensure cluster stability
Index	A collection of documents (like a database table)	Each index has a mapping (schema)
Shard	A partition of an index distributed across nodes	Primary shards + replica shards for HA
UltraWarm	Warm storage tier (S3-backed, read-only)	80% cheaper for infrequently queried data
Cold storage	Coldest tier (detached, S3)	Cheapest; data must be reattached before querying

Common Data Ingestion Patterns Core

Pattern	Flow	Use Case
Kinesis Firehose	Sources → Firehose → OpenSearch	Streaming logs/events in near-real-time
CloudWatch Logs	CW Logs → Subscription filter → OpenSearch	Application logs from Lambda, ECS, EC2
Logstash	Servers → Logstash → OpenSearch	On-prem or EC2 log collection (ELK stack)
Direct API	Application → OpenSearch REST API	Custom indexing from applications
DynamoDB Streams	DynamoDB → Lambda → OpenSearch	Search layer over DynamoDB data

OpenSearch vs Redshift vs Athena Core

Dimension	OpenSearch	Redshift	Athena
Primary use	Full-text search, logs, observability	Data warehouse, BI, complex analytics	Ad-hoc SQL on S3
Query model	JSON queries, full-text, fuzzy	SQL (PostgreSQL-like)	SQL (Presto/Trino)
Data format	JSON documents (denormalized)	Structured tables (columnar)	Files in S3 (any format)
Real-time ingestion	✅ Sub-second indexing	❌ Batch (COPY)	❌ Query existing files
Dashboards	OpenSearch Dashboards (built-in)	QuickSight / Tableau (external)	QuickSight (external)
Infrastructure	Managed cluster (instances)	Managed cluster or Serverless	Serverless

OpenSearch Serverless Deep

OpenSearch Serverless removes cluster management — you create collections (search or time-series), and AWS manages capacity, scaling, and infrastructure. Pay per OCU (OpenSearch Compute Unit) consumed.

No instance types or node counts to choose
Auto-scales based on indexing and query load
Two collection types: Search (full-text) and Time-series (logs/metrics)
Minimum 4 OCU (~$0.24/OCU-hour) — not truly zero-cost when idle

🎯 Exam Insight

"Full-text search" or "log analytics" or "ELK stack on AWS" → OpenSearch
"Kibana-like dashboards" → OpenSearch Dashboards
"Search layer over DynamoDB" → DynamoDB Streams → Lambda → OpenSearch
"Centralise application logs for investigation" → CloudWatch Logs → OpenSearch
"UltraWarm" → 80% cheaper warm tier for old log data (read-only)
"OpenSearch vs CloudWatch Logs Insights" → OpenSearch = more powerful, custom dashboards. CW Insights = simpler, built-in.
"OpenSearch vs Redshift" → OpenSearch = search/logs. Redshift = structured analytics/BI.

Chapter 03 — Key Takeaway

OpenSearch is the search and observability engine — use it for full-text search, log analytics, and real-time monitoring dashboards. It excels where Redshift and Athena don't: text search, sub-second log indexing, and Kibana-style exploration. Ingest via Firehose, CloudWatch subscriptions, or direct API. Use UltraWarm for cost-effective old log retention.

Analytics Services — Complete Decision Guide Core

If You Need…	Use…
Ad-hoc SQL on S3, serverless	Athena
Metadata catalog for data lake	Glue Data Catalog
Serverless ETL (CSV→Parquet)	Glue ETL
Real-time streaming ingestion	Kinesis Data Streams
Deliver streams to S3 automatically	Kinesis Firehose
Sub-second BI, complex joins, high concurrency	Redshift
BI dashboards, visualisation	QuickSight
Fine-grained data lake security	Lake Formation
Full-text search, log analytics	OpenSearch