OpenSearch Lake Formation
LearningTree Β· AWS Β· Analytics

Other Analytics Services β€”
QuickSight Β· Lake Formation Β· OpenSearch

Three services that complete the analytics ecosystem: QuickSight visualises data, Lake Formation governs the data lake, and OpenSearch powers search and log analytics.

01
Chapter One Β· Analytics

Amazon QuickSight β€” Serverless BI Dashboards

Amazon QuickSight is a serverless, cloud-native business intelligence (BI) service that lets you create interactive dashboards, reports, and visualisations β€” embedded in applications or shared with users. It connects to virtually every AWS analytics service and scales to thousands of users without managing infrastructure.

What QuickSight Does Introductory
πŸ“Š

Interactive Dashboards

  • Drag-and-drop visual builder
  • Charts, tables, KPIs, maps, pivot tables
  • Drill-down, filter, parameter controls
  • Auto-refresh on schedule
πŸ”—

Data Sources

  • Athena, Redshift, RDS, Aurora
  • S3 (CSV, JSON, Parquet)
  • OpenSearch, Timestream
  • On-prem via Direct Connect
  • SaaS: Salesforce, Jira, etc.
πŸ€–

ML Insights (Q)

  • Natural language queries ("What were sales last month?")
  • Anomaly detection built-in
  • Forecasting with ML
  • Auto-narratives (text summaries)
SPICE β€” In-Memory Engine Core

SPICE (Super-fast, Parallel, In-memory Calculation Engine) is QuickSight's in-memory cache. When you import data into SPICE:

  • Data is loaded into QuickSight's own storage (up to 500M rows per dataset)
  • Dashboards query SPICE directly β€” not the source database
  • Sub-second response regardless of source query speed
  • Scheduled refresh keeps SPICE in sync (hourly/daily)
  • Alternative: "Direct Query" mode hits the source live (slower but real-time)
QuickSight Pricing Core
EditionCostKey Difference
Standard$9/author/monthBasic dashboards, limited sharing
Enterprise$18/author, $0.30/reader-sessionRow-level security, embedding, Q (NLQ), ML insights
SPICE capacity$0.25/GB/month (included 10GB/user)In-memory cache for fast dashboards
Reader sessions$0.30/session (30-min), max $5/reader/monthPay only when viewers open dashboards
🧠 Key Distinction

QuickSight charges per reader session (not per user license) β€” meaning a viewer who opens the dashboard once a month costs $0.30. This makes it dramatically cheaper than traditional BI tools (Tableau: $70/user/month) for large viewer audiences.

🎯 Exam Insight
  • "Serverless BI dashboards" or "visualise data from Athena/Redshift" β†’ QuickSight
  • "SPICE" = QuickSight's in-memory engine for fast dashboard rendering
  • "Embed dashboards in application" β†’ QuickSight Enterprise (embedded analytics)
  • "Pay per reader session, not license" β†’ QuickSight pricing model
  • "Natural language queries on data" β†’ QuickSight Q
  • "ML-powered anomaly detection in dashboards" β†’ QuickSight ML Insights
Chapter 01 β€” Key Takeaway

QuickSight is the serverless BI layer of the AWS analytics stack. It connects to Athena, Redshift, and S3, caches data in SPICE for sub-second dashboards, and charges per reader session ($0.30) β€” making it cost-effective for thousands of viewers. Use it as the visualisation endpoint for any analytics pipeline.

02
Chapter Two Β· Analytics

AWS Lake Formation β€” Data Lake Governance

AWS Lake Formation is a governance and security layer built on top of the Glue Data Catalog. It simplifies building, securing, and managing data lakes β€” providing fine-grained access control (column-level, row-level, cell-level) that Glue alone doesn't offer.

What Lake Formation Adds Beyond Glue Core
CapabilityGlue AloneWith Lake Formation
Table-level access controlIAM policies on Glue Catalogβœ… Centralised GRANT/REVOKE (SQL-like)
Column-level security❌ Not supportedβœ… Grant access to specific columns only
Row-level security❌ Not supportedβœ… Filter rows by user attributes
Cell-level security❌ Not supportedβœ… Column + row intersection filtering
Cross-account sharingComplex IAM + resource policiesβœ… Simple GRANT to external account
Data location registrationManual S3 permissionsβœ… Register S3 locations; LF manages access
Governed tables❌ No ACID on S3βœ… ACID transactions on S3 data lake tables
How Lake Formation Works Core
Lake Formation β€” Governance Layer Architecture
USERS Athena / EMR LAKE FORMATION Column/Row security GRANT / REVOKE Cross-account sharing GLUE CATALOG Tables Β· Schemas Metadata S3 DATA LAKE Registered locations CONSUMERS Athena Β· EMR Β· Redshift
Lake Formation sits between users and the data lake β€” enforcing column/row security on every query
Common Lake Formation Use Cases Core
πŸ”’

Fine-Grained Access

  • Marketing team sees only their columns (no PII)
  • Finance sees all columns but only their region's rows
  • Data engineers have full access
  • All enforced centrally β€” no S3 bucket policies needed
🀝

Cross-Account Data Sharing

  • Share specific tables with another AWS account
  • Consumer account queries via Athena or Redshift Spectrum
  • No data copying β€” shared via catalog
  • Revoke access instantly
🎯 Exam Insight
  • "Column-level security on data lake" β†’ Lake Formation
  • "Row-level filtering in Athena" β†’ Lake Formation data filters
  • "Centralised GRANT/REVOKE for data lake" β†’ Lake Formation (not raw IAM)
  • "Share data lake tables across accounts" β†’ Lake Formation cross-account sharing
  • "ACID transactions on S3" β†’ Lake Formation governed tables
  • "Lake Formation vs Glue" β†’ Glue = metadata + ETL. Lake Formation = security + governance on top of Glue.
Chapter 02 β€” Key Takeaway

Lake Formation is the governance layer for your data lake. It adds what Glue Catalog lacks: column-level security, row-level filtering, cross-account sharing, and centralised GRANT/REVOKE β€” all without complex IAM policies. Use it when you need fine-grained access control over who sees what data in your lake.

03
Chapter Three Β· Analytics

Amazon OpenSearch Service β€” Search & Log Analytics

Amazon OpenSearch Service (successor to Elasticsearch Service) is a managed search and analytics engine. It excels at full-text search, log analytics, real-time monitoring, and observability dashboards β€” use cases where Redshift and Athena are the wrong tool.

What OpenSearch Does Best Core
πŸ”Ž

Full-Text Search

  • Inverted index for fast text search
  • Fuzzy matching, autocomplete
  • Relevance scoring (BM25)
  • E-commerce product search
πŸ“‹

Log Analytics

  • Centralised log aggregation
  • CloudWatch Logs β†’ OpenSearch
  • VPC Flow Logs analysis
  • Application error investigation
πŸ“Š

Observability

  • OpenSearch Dashboards (Kibana fork)
  • Real-time metrics visualisation
  • Alerting on patterns/thresholds
  • Trace analytics (distributed tracing)
OpenSearch Architecture Core
ComponentWhat It IsNotes
DomainA managed OpenSearch clusterEquivalent to a "cluster" β€” contains nodes
Data nodesStore data + execute queriesScale by adding nodes or upgrading instance type
Master nodesManage cluster state (3 dedicated recommended)Don't store data; ensure cluster stability
IndexA collection of documents (like a database table)Each index has a mapping (schema)
ShardA partition of an index distributed across nodesPrimary shards + replica shards for HA
UltraWarmWarm storage tier (S3-backed, read-only)80% cheaper for infrequently queried data
Cold storageColdest tier (detached, S3)Cheapest; data must be reattached before querying
Common Data Ingestion Patterns Core
PatternFlowUse Case
Kinesis FirehoseSources β†’ Firehose β†’ OpenSearchStreaming logs/events in near-real-time
CloudWatch LogsCW Logs β†’ Subscription filter β†’ OpenSearchApplication logs from Lambda, ECS, EC2
LogstashServers β†’ Logstash β†’ OpenSearchOn-prem or EC2 log collection (ELK stack)
Direct APIApplication β†’ OpenSearch REST APICustom indexing from applications
DynamoDB StreamsDynamoDB β†’ Lambda β†’ OpenSearchSearch layer over DynamoDB data
OpenSearch vs Redshift vs Athena Core
DimensionOpenSearchRedshiftAthena
Primary useFull-text search, logs, observabilityData warehouse, BI, complex analyticsAd-hoc SQL on S3
Query modelJSON queries, full-text, fuzzySQL (PostgreSQL-like)SQL (Presto/Trino)
Data formatJSON documents (denormalized)Structured tables (columnar)Files in S3 (any format)
Real-time ingestionβœ… Sub-second indexing❌ Batch (COPY)❌ Query existing files
DashboardsOpenSearch Dashboards (built-in)QuickSight / Tableau (external)QuickSight (external)
InfrastructureManaged cluster (instances)Managed cluster or ServerlessServerless
OpenSearch Serverless Deep

OpenSearch Serverless removes cluster management β€” you create collections (search or time-series), and AWS manages capacity, scaling, and infrastructure. Pay per OCU (OpenSearch Compute Unit) consumed.

  • No instance types or node counts to choose
  • Auto-scales based on indexing and query load
  • Two collection types: Search (full-text) and Time-series (logs/metrics)
  • Minimum 4 OCU (~$0.24/OCU-hour) β€” not truly zero-cost when idle
🎯 Exam Insight
  • "Full-text search" or "log analytics" or "ELK stack on AWS" β†’ OpenSearch
  • "Kibana-like dashboards" β†’ OpenSearch Dashboards
  • "Search layer over DynamoDB" β†’ DynamoDB Streams β†’ Lambda β†’ OpenSearch
  • "Centralise application logs for investigation" β†’ CloudWatch Logs β†’ OpenSearch
  • "UltraWarm" β†’ 80% cheaper warm tier for old log data (read-only)
  • "OpenSearch vs CloudWatch Logs Insights" β†’ OpenSearch = more powerful, custom dashboards. CW Insights = simpler, built-in.
  • "OpenSearch vs Redshift" β†’ OpenSearch = search/logs. Redshift = structured analytics/BI.
Chapter 03 β€” Key Takeaway

OpenSearch is the search and observability engine β€” use it for full-text search, log analytics, and real-time monitoring dashboards. It excels where Redshift and Athena don't: text search, sub-second log indexing, and Kibana-style exploration. Ingest via Firehose, CloudWatch subscriptions, or direct API. Use UltraWarm for cost-effective old log retention.

Analytics Services β€” Complete Decision Guide Core
If You Need…Use…
Ad-hoc SQL on S3, serverlessAthena
Metadata catalog for data lakeGlue Data Catalog
Serverless ETL (CSV→Parquet)Glue ETL
Real-time streaming ingestionKinesis Data Streams
Deliver streams to S3 automaticallyKinesis Firehose
Sub-second BI, complex joins, high concurrencyRedshift
BI dashboards, visualisationQuickSight
Fine-grained data lake securityLake Formation
Full-text search, log analyticsOpenSearch