Amazon Kinesis β
Real-Time Streaming at Any Scale
Capture, process, and deliver streaming data in real time β from IoT sensors and clickstreams to application logs and financial transactions. Kinesis is the real-time counterpart to batch analytics with Athena and Glue.
What is Amazon Kinesis?
Amazon Kinesis is a managed platform for real-time streaming data. It collects, processes, and delivers data records continuously β in milliseconds to seconds β rather than waiting for batch intervals of minutes or hours. Kinesis is an umbrella brand with multiple services, each solving a different streaming problem.
| Aspect | Batch (Athena / Glue) | Streaming (Kinesis) |
|---|---|---|
| Data delivery | Collected over hours, processed together | Processed record-by-record as it arrives |
| Latency | Minutes to hours | Milliseconds to seconds |
| Use cases | Reports, historical analysis, ad-hoc queries | Real-time dashboards, alerting, event processing |
| Data model | Files in S3 (CSV, Parquet) | Continuous stream of records (ordered sequence) |
| Scaling | Query-time (Athena scales per query) | Capacity-based (shards provisioned ahead) |
| Cost model | Pay per query / per TB scanned | Pay per shard-hour + per GB ingested |
Think of Kinesis as a river system. Data flows continuously from sources (streams). You can dip a bucket in the river at any point to read data (consumers). The river doesn't stop β it just keeps flowing. Batch analytics (Athena) is like analysing a lake after the river has filled it. Streaming analytics (Kinesis) is like analysing the water as it flows past you.
Kinesis Data Streams (KDS)
- Core streaming service β capture and store data records
- Real-time ingestion with ordered shards
- YOU write consumer code (Lambda, KCL, custom)
- Retention: 24 hours β 365 days
- Most flexible β full control over processing
Kinesis Data Firehose
- Fully managed delivery to destinations
- Auto-batches and delivers to S3, Redshift, OpenSearch, Splunk
- No consumer code needed β zero administration
- Near-real-time (60-second minimum buffer)
- Built-in format conversion (JSON β Parquet)
Kinesis Data Analytics
- Run SQL or Apache Flink on streaming data
- Real-time aggregations, windowed queries
- Input from KDS or Firehose
- Output to KDS, Firehose, Lambda
- Now called "Managed Service for Apache Flink"
Kinesis Video Streams
- Capture and stream video from devices
- Cameras, drones, dashcams, IoT video
- Integrates with Rekognition for ML
- Different use case β not covered here
| Dimension | Kinesis Data Streams | Kinesis Data Firehose |
|---|---|---|
| Administration | You manage shards, consumers | Fully managed β zero administration |
| Latency | ~200ms (real-time) | 60 seconds minimum (near-real-time) |
| Consumer code | Required (Lambda, KCL, custom) | Not needed β built-in delivery |
| Destinations | Anything (you write the code) | S3, Redshift, OpenSearch, Splunk, HTTP |
| Data retention | 24h β 365 days (replay capable) | None β delivers once and forgets |
| Replay | β Re-read from any point in retention window | β No replay β fire and forget |
| Scaling | Manual shard splitting/merging (or on-demand mode) | Auto-scales automatically |
| Format conversion | Not built-in | JSON β Parquet/ORC built-in |
| Best for | Custom real-time processing, multiple consumers | Simple delivery to S3/analytics stores |
Real-Time Dashboards
- Live metrics from web applications
- Gaming leaderboards updated per second
- Trading floor price displays
- Ad impression counting
Anomaly Detection
- Fraud detection on transactions
- Security threat detection
- DDoS traffic spike alerting
- Healthcare vitals monitoring
Log & Event Pipelines
- Application log centralisation
- Clickstream collection β S3 for analytics
- IoT sensor data ingestion
- Microservice event streaming
- "Real-time data ingestion" or "streaming" β Kinesis Data Streams
- "Deliver streaming data to S3 automatically" β Kinesis Firehose
- "Near-real-time, zero administration delivery" β Firehose (not Data Streams)
- "Custom real-time processing with multiple consumers" β Data Streams + Lambda/KCL
- "Replay streaming data" β Data Streams (Firehose cannot replay)
- "Convert JSON to Parquet during delivery" β Firehose (built-in format conversion)
- Data Streams = you manage consumers. Firehose = fully managed delivery.
Kinesis is AWS's real-time streaming platform. Data Streams gives you full control with ordered shards and custom consumers (~200ms latency). Firehose is zero-admin delivery to S3/Redshift/OpenSearch (60s buffer). Use Data Streams when you need real-time processing, replay, or multiple consumers. Use Firehose when you just need to land streaming data in a destination with no code.
Kinesis Data Streams β Deep Dive
Kinesis Data Streams (KDS) is the core real-time ingestion service. It captures data records into an ordered, durable log partitioned across shards. You control capacity, retention, and processing logic. It's the foundation for any custom real-time architecture on AWS.
| Concept | What It Is | Key Detail |
|---|---|---|
| Stream | A named channel for data records | Contains 1+ shards; you choose shard count |
| Shard | A unit of capacity β ordered sequence of records | 1 MB/s write, 2 MB/s read per shard |
| Record | A single data blob (up to 1 MB) | Contains: partition key + sequence number + data blob |
| Partition Key | Determines which shard a record goes to (hashed) | Same key = same shard = guaranteed ordering |
| Sequence Number | Unique ID assigned by Kinesis per record per shard | Monotonically increasing within a shard |
| Retention | How long records stay in the stream | Default 24h, extendable to 7 days (or 365 days) |
Think of a Kinesis stream as a highway. Each shard is one lane. More lanes (shards) = more cars (records) can flow simultaneously. Each lane has a speed limit: 1 MB/s in, 2 MB/s out. If you need more throughput, add more lanes. Records with the same partition key always use the same lane (ordering preserved).
Write Capacity (per shard)
- 1 MB/second ingestion
- 1,000 records/second
- Each record max 1 MB
- Exceed β
ProvisionedThroughputExceededException - Fix: add more shards or use on-demand mode
Read Capacity (per shard)
- 2 MB/second shared across all consumers
- Or 2 MB/second per consumer with Enhanced Fan-Out
- 5 GetRecords calls/second per shard (shared mode)
- Multiple consumers compete for read bandwidth
- Fix: Enhanced Fan-Out (dedicated throughput per consumer)
| Mode | How It Works | Cost | Best For |
|---|---|---|---|
| Provisioned | You specify exact shard count; manually scale | $0.015/shard-hour + $0.014/million PUT | Predictable, steady traffic with known throughput |
| On-Demand | Auto-scales shards based on traffic (up to 200 MB/s) | $0.04/GB ingested + $0.04/GB retrieved | Unpredictable, bursty, or new workloads |
On-Demand is simpler (no shard math) but ~2Γ more expensive at steady loads. Choose Provisioned when you know your throughput and want to optimise cost. Choose On-Demand when traffic is unpredictable, you're starting a new stream, or you want zero capacity planning.
The partition key is the most important design decision in Kinesis. It determines which shard receives each record:
Good Partition Key Design
- High cardinality (many unique values)
- Evenly distributed across shards
- Example:
user_id,device_id,session_id - Same key = same shard = ordering preserved per entity
Bad Partition Key Design
- Low cardinality (few unique values)
- Creates "hot shards" β one shard overloaded
- Example:
country(only ~200 values, US gets 80% of traffic) - Hot shard β throttling even with capacity available
| Aspect | Shared (Classic) | Enhanced Fan-Out (EFO) |
|---|---|---|
| Read throughput | 2 MB/s shared across ALL consumers per shard | 2 MB/s dedicated PER consumer per shard |
| Delivery | Consumer polls (pull-based) | Push-based (SubscribeToShard API) |
| Latency | ~200ms (polling interval) | ~70ms (push, lower latency) |
| Cost | Included in base shard price | Additional $0.015/consumer-shard-hour + $0.013/GB |
| Best for | 1β2 consumers, cost-sensitive | 3+ consumers, latency-sensitive, isolation needed |
| Producer Method | When to Use | Key Detail |
|---|---|---|
| AWS SDK (PutRecord / PutRecords) | Simple applications, low-to-medium throughput | PutRecords batches up to 500 records per call |
| Kinesis Producer Library (KPL) | High-throughput production workloads | Auto-batches, retries, aggregates small records |
| Kinesis Agent | Stream log files from EC2 instances | Daemon that tails files β writes to stream |
| CloudWatch Logs subscription | Stream CloudWatch Logs to Kinesis | Built-in integration, no code needed |
| IoT Core rule action | IoT devices β Kinesis directly | Rule routes MQTT messages to stream |
- "Ordered processing per user/device" β use user_id/device_id as partition key (same key = same shard = ordered)
- "ProvisionedThroughputExceededException" β hot shard or insufficient shards. Fix: better partition key or add shards
- "1 MB/s write, 2 MB/s read per shard" β memorise these limits
- "Multiple consumers need dedicated throughput" β Enhanced Fan-Out
- "On-demand mode" β auto-scales, no shard math, ~2Γ cost of provisioned for steady traffic
- "Replay data from 3 days ago" β extend retention to 7 days (or 365 for long-term)
- "High-throughput producer" β Kinesis Producer Library (KPL) with aggregation + batching
Kinesis Data Streams is built on shards β each shard handles 1 MB/s in and 2 MB/s out. Partition keys determine which shard receives each record, guaranteeing order per key. Choose provisioned mode for cost at steady load, on-demand for simplicity with bursty traffic. Enhanced Fan-Out gives each consumer dedicated 2 MB/s per shard with push-based delivery at ~70ms latency.
Kinesis Data Firehose β Managed Delivery
Kinesis Data Firehose is the zero-administration delivery pipe. You point it at a source and a destination β it handles batching, compression, format conversion, and delivery automatically. No shards to manage, no consumer code, no capacity planning. Near-real-time with a 60-second minimum buffer.
Batching
- Buffers records by time (60β900s) or size (1β128 MB)
- Delivers one file per batch to S3
- Reduces number of S3 objects dramatically
- Configurable buffer hints
Format Conversion
- JSON β Parquet (most common)
- JSON β ORC
- Uses Glue Catalog schema for conversion
- Zero-code columnar conversion
Compression
- GZIP, Snappy, ZIP, or uncompressed
- Applied after format conversion
- Reduces storage cost in S3
- Parquet has built-in Snappy
| Destination | Use Case | Key Detail |
|---|---|---|
| Amazon S3 | Data lake landing, archive, analytics with Athena | Most common. Supports Parquet conversion. Partition by date. |
| Amazon Redshift | Load streaming data into data warehouse | Firehose first writes to S3, then COPY's into Redshift |
| Amazon OpenSearch | Real-time log search and dashboards | Delivers JSON; good for Kibana/Dashboards |
| Splunk | Enterprise log management (SIEM) | HEC (HTTP Event Collector) endpoint |
| HTTP Endpoint | Custom API, Datadog, New Relic, Sumo Logic | Any HTTPS endpoint that accepts batched POSTs |
| Kinesis Data Streams | Fan-out: KDS β Firehose β S3 (common pattern) | Firehose consumes from an existing KDS stream |
Firehose can invoke a Lambda function on each batch before delivery. The Lambda receives records, transforms them (add fields, filter, parse), and returns them to Firehose for delivery. Common uses:
- Parse raw log lines into structured JSON
- Add enrichment data (e.g. GeoIP lookup on IP addresses)
- Filter out unwanted records (reduce storage cost)
- Normalize timestamps across time zones
Configure Firehose with a Glue Catalog table schema β Firehose converts incoming JSON records to Parquet automatically. The output files land in S3 as Parquet, immediately queryable by Athena. This is the lowest-effort way to build a streaming data lake.
| Limit | Value | Note |
|---|---|---|
| Minimum buffer interval | 60 seconds | Cannot deliver faster than 1 min (not true real-time) |
| Maximum record size | 1 MB | Same as Data Streams |
| Maximum buffer size | 128 MB | Delivers when buffer fills OR interval expires (whichever first) |
| Lambda transform timeout | 5 minutes max | Lambda receives up to 6 MB of records per invocation |
| Retry on failure | Up to 24 hours (S3), varies per destination | Undeliverable records β S3 backup bucket |
- "Deliver streaming data to S3 with no code" β Kinesis Firehose
- "Convert JSON to Parquet during delivery" β Firehose + Glue Catalog schema
- "60-second minimum latency" β Firehose is NEAR-real-time, not true real-time
- "Transform records in flight before delivery" β Firehose + Lambda transformation
- "Firehose delivers to Redshift" β actually writes to S3 first, then COPY into Redshift
- "Failed delivery records" β automatically backed up to an S3 error bucket
- "No shards, no capacity planning" β Firehose auto-scales (unlike Data Streams provisioned)
- "Firehose vs Data Streams for replay?" β Only Data Streams supports replay. Firehose delivers once.
Kinesis Firehose is the fully-managed delivery service β zero shards, zero consumer code, auto-scaling. It buffers records (60sβ15min), optionally transforms via Lambda, converts to Parquet using Glue Catalog, compresses, and delivers to S3/Redshift/OpenSearch. The 60-second minimum buffer means it's near-real-time, not true real-time. Use it when you just want data to land in a destination with minimal effort.
Consumers & Processing
Consumers read records from Kinesis Data Streams and process them. AWS offers multiple consumer options β from serverless (Lambda) to fully custom (KCL applications). The choice depends on latency requirements, processing complexity, and operational preference.
| Consumer | How It Works | Best For | Scaling |
|---|---|---|---|
| AWS Lambda | Event source mapping polls shards, invokes function per batch | Serverless, simple transforms, <15 min processing | 1 Lambda invocation per shard (parallel = shard count) |
| KCL (Kinesis Client Library) | Java/Python library manages shard leases, checkpoints on EC2/ECS | Long-running, complex processing, custom checkpointing | 1 worker per shard; library handles rebalancing |
| Managed Apache Flink | Stream processing with SQL or Java/Python (ex-Kinesis Data Analytics) | Windowed aggregations, joins, complex event processing | Managed scaling based on throughput |
| Kinesis Firehose | Consume from KDS and deliver to S3/Redshift/OpenSearch | Simple delivery to storage (no custom logic) | Auto-scales, no management |
| SDK (GetRecords) | Direct API polling in custom application | Simple scripts, testing, one-off reads | Manual β you manage everything |
Advantages
- Zero infrastructure β no EC2, no containers
- Auto-polls shards, manages checkpoints
- Scales to 1 concurrent invocation per shard
- Built-in retry with bisect-on-error
- Sends failed batches to DLQ (SQS or SNS)
Limitations
- 15-minute max execution time
- One invocation per shard (parallelization factor configurable up to 10)
- Cold starts add latency
- Memory limited to 10 GB
- Not ideal for stateful processing
The Kinesis Client Library (KCL) is a Java library (with Python/Node wrappers) that handles shard management for you:
- Lease management β assigns shards to workers via DynamoDB lease table
- Checkpointing β tracks last processed sequence number per shard
- Rebalancing β when workers are added/removed, shards are redistributed
- Shard splits/merges β KCL handles resharding transparently
Number of KCL workers should equal number of shards for maximum parallelism. More workers than shards = some workers idle. Fewer workers = some workers process multiple shards (fine, just slower).
| Strategy | How It Works | Use Case |
|---|---|---|
| Retry entire batch | Kinesis retries the failed batch (default for Lambda) | Transient errors that resolve on retry |
| Bisect on error | Split failed batch in half, retry each half | Identify and isolate the poisonous record |
| Skip & continue | Move checkpoint past the bad record | Non-critical records where data loss is acceptable |
| Dead-letter queue | Send failed records to SQS DLQ after max retries | Preserve failed records for later investigation |
| Max retry attempts | Fail after N retries, move to next batch | Prevent infinite blocking on bad records |
- "Serverless consumer for Kinesis" β Lambda with event source mapping
- "1 Lambda per shard" β default parallelism (configurable up to 10 with parallelization factor)
- "Checkpoint progress" β KCL uses DynamoDB table for lease/checkpoint tracking
- "Poison pill record blocking the stream" β enable bisect-on-error + DLQ
- "Real-time SQL on streaming data" β Managed Apache Flink (formerly Kinesis Data Analytics)
- "Fan-out: KDS β multiple destinations" β KDS β Lambda + KDS β Firehose (parallel consumers)
Lambda is the simplest consumer (serverless, auto-checkpoint, DLQ support) but limited to 15 min and 1 invocation per shard. KCL gives full control with DynamoDB-based checkpointing for long-running EC2/ECS apps. Flink is for complex stream SQL (windowed aggregations, joins). Always configure error handling β bisect-on-error + DLQ prevents one bad record from blocking your entire stream.
Architecture Patterns
Kinesis rarely operates alone. These are the four most common production patterns β from simple log delivery to real-time analytics pipelines combining multiple AWS services.
The most popular Kinesis pattern: stream data, convert to Parquet, land in S3, query with Athena.
Stream to KDS for real-time processing (Lambda) while simultaneously archiving to S3 via Firehose. Best of both worlds.
Real-Time Path
- KDS β Lambda β DynamoDB / alerts / API
- Sub-second processing
- Fraud detection, live metrics, notifications
Archive Path (same stream)
- KDS β Firehose β S3 (Parquet)
- Near-real-time archival for analytics
- Athena queries on historical data
One KDS stream β multiple independent consumers, each doing different work. With Enhanced Fan-Out, each gets dedicated 2 MB/s per shard.
| Consumer | Purpose |
|---|---|
| Lambda #1 | Real-time anomaly detection β SNS alerts |
| Lambda #2 | Update DynamoDB counters (live dashboard) |
| Firehose | Archive to S3 data lake |
| Flink | Windowed aggregations β Redshift |
| Requirement | Best Service | Why |
|---|---|---|
| Real-time streaming, ordering, replay | Kinesis Data Streams | Ordered shards, retention, replay from any point |
| Simple work queue, one consumer per message | SQS | Simpler, per-message, no ordering needed |
| Event routing by content, many targets | EventBridge | Rules route events to specific targets by content |
| High-throughput ingestion (100K+ records/sec) | Kinesis Data Streams | Designed for high-volume streaming ingestion |
| Deliver raw logs to S3 without code | Kinesis Firehose | Managed delivery, format conversion, zero ops |
| React to AWS service events | EventBridge | Native integration with 180+ services |
- "Stream data to S3 as Parquet for Athena" β Firehose + Glue Catalog (Pattern 1)
- "Process in real-time AND archive" β KDS with Lambda consumer + Firehose consumer (Pattern 2)
- "Multiple consumers, independent processing" β KDS with Enhanced Fan-Out (Pattern 3)
- "Kinesis vs SQS" β Kinesis = ordered stream, replay, high throughput. SQS = per-message queue, simpler
- "Kinesis vs EventBridge" β Kinesis = high-volume ingestion. EventBridge = event routing by content
The most common Kinesis pattern is Firehose β S3 (Parquet) β Athena for streaming data lakes. For real-time + archive, use KDS with parallel consumers (Lambda for real-time, Firehose for S3). Enhanced Fan-Out enables multiple isolated consumers on the same stream. Choose Kinesis over SQS when you need ordering, replay, or high-throughput streaming. Choose SQS for simple work queues.
Cost & Best Practices
Kinesis costs depend heavily on which service you use and how you configure it. Data Streams charges per shard-hour. Firehose charges per GB ingested. Understanding the cost model is key to building affordable streaming architectures.
| Service | Pricing Dimension | Approximate Cost |
|---|---|---|
| KDS Provisioned | Per shard-hour | $0.015/shard-hour + $0.014/million PUT payloads (25KB each) |
| KDS On-Demand | Per GB ingested + retrieved | $0.04/GB in + $0.04/GB out |
| KDS Extended Retention | Per shard-hour (beyond 24h) | $0.023/shard-hour (7-day) or $0.046 (365-day) |
| KDS Enhanced Fan-Out | Per consumer-shard-hour + per GB | $0.015/consumer-shard-hour + $0.013/GB retrieved |
| Firehose | Per GB ingested | $0.029/GB (first 500 TB/month) |
| Firehose format conversion | Per GB converted | $0.018/GB (JSON β Parquet) |
| # | Optimization | Impact | How |
|---|---|---|---|
| 1 | Right-size shards | 30β70% savings | Monitor IncomingBytes metric; reduce shards if utilisation <50% |
| 2 | Use Firehose instead of KDS + Lambda for simple delivery | Simpler + cheaper for S3 archival | Firehose eliminates Lambda cost, shard management |
| 3 | Aggregate small records with KPL | Up to 25Γ fewer PUTs | KPL aggregates multiple sub-25KB records into one PUT payload |
| 4 | Use On-Demand only if truly bursty | Provisioned is 2Γ cheaper at steady load | Switch to provisioned once you know your throughput pattern |
| 5 | Minimise retention | Avoid extended retention charges | Keep 24h default unless replay is actually needed |
| 6 | Avoid Enhanced Fan-Out unless needed | $0 extra cost without it | Use shared mode for 1β2 consumers; EFO only for 3+ |
Data Streams Best Practices
- Use high-cardinality partition keys (avoid hot shards)
- Monitor
WriteProvisionedThroughputExceededalarm - Use KPL for high-throughput producers (aggregation + batching)
- Enable enhanced monitoring for per-shard metrics
- Set up Lambda DLQ for poisonous record handling
- Use on-demand mode for new/unpredictable workloads
Firehose Best Practices
- Enable Parquet conversion with Glue Catalog schema
- Set buffer interval to 60s for lowest latency delivery
- Configure S3 error output bucket for failed records
- Use Lambda transform only when needed (adds cost)
- Partition output by date:
year/month/day/hour/ - Enable CloudWatch delivery metrics for monitoring
Using KDS + Lambda When Firehose Suffices
If you just need to deliver data to S3/Redshift/OpenSearch with no custom logic, Firehose is simpler and cheaper. Adding KDS + Lambda for simple delivery adds unnecessary complexity and cost.
Low-Cardinality Partition Keys
Using country or status as partition key creates hot shards. Use user_id, device_id, or random UUIDs for even distribution.
No Error Handling on Lambda Consumer
Without bisect-on-error and DLQ, one bad record blocks the entire shard indefinitely. Always configure max retry attempts + DLQ for Lambda event source mappings.
Over-Provisioning Shards
Starting with 100 shards "just in case" wastes $108/month even with zero traffic. Start small, use CloudWatch alarms to trigger scale-out, or use on-demand mode initially.
- What: Managed real-time streaming platform β Data Streams (custom processing), Firehose (managed delivery), Flink (stream SQL).
- Data Streams: Shard-based, ordered by partition key, 1 MB/s in + 2 MB/s out per shard. Retention 24hβ365 days. Supports replay.
- Firehose: Zero-admin delivery to S3/Redshift/OpenSearch. 60-second buffer minimum. Built-in JSONβParquet conversion. No replay.
- Consumers: Lambda (serverless, 1/shard), KCL (custom, DynamoDB checkpoints), Flink (windowed SQL), Firehose (delivery).
- Key Decision: Data Streams when you need real-time processing, replay, or multiple consumers. Firehose when you just need delivery with no code.
- vs SQS: Kinesis = ordered stream, high throughput, replay. SQS = per-message queue, simpler.
- Cost: KDS = $0.015/shard-hour (provisioned). Firehose = $0.029/GB ingested. Right-size shards, use KPL aggregation, minimise retention.
Kinesis is the real-time half of the AWS analytics stack. Where Athena and Glue process data at rest in S3, Kinesis processes data in motion. Data Streams gives you full control over real-time processing with shards and custom consumers. Firehose gives you zero-ops delivery to storage. Together with Athena, they form the complete analytics pipeline: stream β store β query.