Amazon SQS β
Simple Queue Service
A fully managed message queue that decouples distributed systems β producers drop work in, consumers process it on their own terms. No direct connections, no cascading failures, no traffic spikes crashing your services.
ποΈ SQS in 30 Seconds
- Managed message queue β producers enqueue, consumers dequeue and process independently
- Pull-based (polling) β consumers ask for messages at their own pace, unlike SNS push
- Messages stored durably for up to 14 days β survive consumer downtime
- Standard: near-unlimited throughput, best-effort order Β· FIFO: strict order, exactly-once, 3,000 msg/s
- Dead-Letter Queue (DLQ) catches messages that fail repeatedly β essential for production
What is Amazon SQS
In a naive microservices architecture, every service calls other services directly. The Order API calls the Email Service, the Inventory Service, and the Shipping Service β all synchronously, all in real time. This causes three serious problems:
Traffic Spikes Crash Services
A flash sale sends 100Γ normal orders. The Order API overwhelms the Inventory Service β which can't scale fast enough. Requests fail. Orders are lost.
Failures Cascade
Email Service goes down for 2 minutes. Every order fails β even though the warehouse is running fine. One broken service breaks everything downstream.
Tight Coupling
Adding a new fraud detection service means changing the Order API and redeploying. Every new consumer makes the producer more complex.
π SQS is like a warehouse receiving dock. Delivery trucks (producers) drop off packages at any time. The dock holds them safely. Warehouse workers (consumers) pick up packages when they're ready β at their own pace. If the workers are busy, packages wait. Nothing is lost. Nobody blocks waiting for each other.
More everyday analogies:
Ticket Queue
Customers take a number and wait. Service agents handle one at a time. A rush of customers doesn't overwhelm agents β it just lengthens the queue temporarily.
Mailbox
The postman drops letters in your mailbox regardless of whether you're home. You read them when you're ready. The postman doesn't wait for you.
Assembly Line Buffer
Parts accumulate on a conveyor between two stations. Station B processes at its own speed. Station A never blocks waiting for B to be free.
Amazon SQS (Simple Queue Service) is a fully managed message queue that enables asynchronous communication between distributed components. The core model:
- Producers send messages into a queue
- Messages are stored durably until a consumer processes them
- Consumers poll the queue and process messages at their own rate
- Once processed successfully, the consumer deletes the message from the queue
| Concern | Direct Synchronous Call | SQS Queue |
|---|---|---|
| Traffic spike | Service overwhelmed, requests dropped | Messages buffer in queue β consumer processes steadily |
| Consumer downtime | Calls fail, data is lost | Messages wait safely for up to 14 days |
| Consumer speed | Producer must wait for response | Producer returns instantly β no waiting |
| Adding new consumer | Change producer code, redeploy | Point new consumer at the queue |
| Retry on failure | Manual retry logic needed | Built-in visibility timeout + DLQ |
| Scaling | Producer and consumer must scale together | Consumer scales independently based on queue depth |
Buffering
The queue holds messages during traffic spikes, consumer slowdowns, and deployments. Work is never lost β it waits safely until capacity is available.
Retry Capability
If processing fails, the message returns to the queue automatically. Built-in retry logic means transient failures are handled without custom code.
Independent Scaling
Consumers scale based on queue depth rather than producer rate. Add more workers when the queue grows β completely independent of the producer.
When an exam question mentions "decouple services", "handle traffic spikes", "buffer requests", or "asynchronous processing" β the answer is an SQS queue. SQS is the AWS answer to workload isolation and async processing, while SNS is the answer to broadcasting events to many consumers.
SQS breaks the direct dependency between producers and consumers β work is stored durably in a queue and processed reliably, regardless of traffic spikes or consumer failures
Why Distributed Systems Need Queues
Every distributed system eventually faces the same question: what happens when two services need to communicate but operate at different speeds, different scales, or different availability levels? Synchronous direct calls work fine at small scale. They break catastrophically at large scale.
Speed Mismatch
Service A can produce 10,000 events/sec. Service B can process 500/sec. Without a queue, 9,500 events per second are either dropped or Service A must slow down β both unacceptable in production.
Availability Mismatch
Service B deploys every Tuesday. Service A cannot stop accepting user requests for 3 minutes while B restarts. With direct calls, A's availability is limited by B's availability.
Scale Mismatch
Service A auto-scales to 50 instances during peak. Service B can only handle 5x load. Direct calls flood B during spikes β B crashes, which cascades back to A and the entire system fails.
Retry Complexity
When Service B is temporarily down, Service A must implement exponential backoff, retry logic, circuit breakers β all custom code. Every service pair adds more complexity.
π Queues solve all four problems simultaneously. They act as a shock absorber between services β absorbing speed mismatches, surviving availability gaps, smoothing scale spikes, and eliminating the need for custom retry logic.
E-Commerce Order Processing
- Orders arrive in bursts (flash sales, promotions)
- Inventory, email, shipping, fraud all need to react
- Queue absorbs the burst β all downstream systems stay stable
- Pattern: Order Service β SQS β multiple workers
Video Processing Pipeline
- User uploads video β transcoding takes 30 seconds
- Can't make the user wait synchronously
- Upload β SQS β transcoding worker β CDN publish
- Pattern: Fan-out to multiple resolution workers
Payment Processing
- Payment accepted instantly, settlement is async
- Fraud check, bank transfer, receipt β all non-blocking
- FIFO queue ensures transaction ordering
- DLQ captures failed transactions for manual review
Email / Notification Systems
- Sending 1M emails takes minutes β never synchronous
- SQS buffers all send requests
- Email workers scale based on queue depth
- Failed sends retry automatically via visibility timeout
One of the most underrated queue properties: a queue lets a system slow down without losing work. This is not possible with synchronous calls. Without a queue, when a service is overwhelmed it drops requests. With a queue, work accumulates and drains as capacity becomes available.
| Benefit | What It Means in Practice |
|---|---|
| Workload buffering | Message queue absorbs traffic spikes. Consumer processes at its own pace. No dropped requests. |
| Decoupling | Producer doesn't know about consumer. Add, replace, or scale consumers without touching the producer. |
| Retry handling | Failed messages return to queue automatically. No custom retry code in your services. |
| Fault tolerance | Consumer downtime doesn't cause data loss. Messages wait. System resumes where it left off. |
| Independent scaling | Scale consumers based on queue depth metric. Auto Scaling reacts to backlog, not to producer rate. |
Processing in the producer
Doing heavy work (DB writes, API calls) in the producer before queuing β defeats the purpose. The producer should enqueue and return immediately. Heavy lifting belongs in the consumer.
Not handling duplicates
Standard queues deliver at-least-once, meaning occasionally a message arrives twice. Consumers that don't handle this idempotently can process orders twice, send two emails, charge twice.
Visibility timeout too short
If the consumer takes 10 seconds to process but timeout is 5 seconds, the message becomes visible again while the first consumer is still working β causing duplicate processing.
No Dead-Letter Queue
Without a DLQ, a "poison" message that always fails processing will loop forever, blocking the queue and consuming all your compute in failed retries.
Common exam pattern: "An application receives variable traffic β 10 requests/sec normally, 10,000/sec during promotions. The processing backend can only handle 50 req/sec max. How do you architect this?" Answer: SQS queue between the frontend and backend. The queue absorbs the spike; the backend processes at 50 req/sec; requests are never dropped. Scale the backend using the SQS ApproximateNumberOfMessages metric in Auto Scaling.
Queues decouple the rate of work arrival from the rate of work processing β enabling services to operate independently, survive each other's failures, and scale without coordination
How SQS Works
SQS uses a polling (pull) model β the consumer actively asks "do you have messages for me?" at regular intervals. This is the opposite of SNS which pushes messages to subscribers. The pull model gives the consumer full control over its processing rate.
Push (SNS) β Producer controls pace
- SNS delivers immediately to all subscribers
- Consumer must handle any rate it receives
- Good for broadcasting events to many subscribers
- Consumer can be overwhelmed during spikes
Pull (SQS) β Consumer controls pace
- Consumer decides when to ask for messages
- Consumer processes at its own maximum rate
- Good for workload processing at controlled speed
- Queue absorbs backlog when consumer is slow
When a consumer receives a message, SQS doesn't delete it immediately. Instead it makes the message invisible to all other consumers for a configurable period β the visibility timeout. This is SQS's built-in retry mechanism.
- Default: 30 seconds. Range: 0 β 12 hours
- Set it longer than your worst-case processing time
- Too short β duplicates; Too long β slow retries on crash
- Consumer can extend it via
ChangeMessageVisibilityAPI while still working
- Standard queue: messages may be delivered more than once
- Consumers must be idempotent β same message twice = same result
- Use a unique message ID or database upsert to handle duplicates
- FIFO queue provides exactly-once processing within 5-minute window
By default, SQS uses short polling β it samples a subset of servers and returns immediately, even if the queue is empty. Long polling waits up to 20 seconds for a message before returning. Always use long polling in production.
| Feature | Short Polling | Long Polling (recommended) |
|---|---|---|
| Wait time | Returns immediately (0s) | Waits up to 20s for a message |
| Empty responses | Many β wastes API calls | Minimal β only returns when message arrives |
| Cost | Higher β many empty polls billed | Lower β fewer API requests |
| Latency | Near-zero when queue is active | Near-zero when message available; waits only when empty |
| How to enable | Default | Set WaitTimeSeconds=20 |
Batching sends multiple messages in a single API request instead of one at a time. This is one of the most important cost optimization techniques β it reduces API calls by up to 90%.
| Aspect | Single Send | Batch Send |
|---|---|---|
| API calls per 10 messages | 10 calls | 1 call (90% reduction) |
| Cost per 1M messages | $0.40 | $0.04 (90% reduction) |
| Max messages per batch | 1 | 10 |
| Max batch size | 256 KB per message | 256 KB total across batch |
SendMessageBatch
Send up to 10 messages in one request. Each can have different body, attributes, and delay.
DeleteMessageBatch
Delete up to 10 messages in one request. Pass receipt handles from processing.
ReceiveMessage
Already returns up to 10 messages per poll. Set MaxNumberOfMessages=10.
π Cost example: With 10M messages/day β without batching: $4/day. With batching: $0.40/day. Savings: $109/month. Always batch when possible.
What if processing time varies? Some messages take 2 seconds, one takes 30 seconds. Use ChangeMessageVisibility API to extend the timeout while processing.
The Problem
- Timeout too short β message reappears mid-processing β duplicate work
- Timeout too long β if consumer dies, message waits unnecessarily
- Variable processing time β no single timeout fits all
The Solution
- Start with timeout = expected time Γ 1.5
- During processing, periodically check remaining time
- If
remainingTime < 30%, callChangeMessageVisibility - Maximum visibility timeout: 12 hours
Message attributes are key-value pairs attached to a message, separate from the body. Use them for routing, filtering, and tagging without parsing JSON.
Use Cases
- Routing: Which service should process this?
- Priority: Process high-priority first
- Source tracking: Which system sent this?
- Versioning: Schema version for consumers
Supported Types
Stringβ text valuesNumberβ integers, floatsBinaryβ base64-encoded data- Custom type IDs (e.g., "image/jpeg")
- Visibility timeout too short β same message processed by two different consumers simultaneously β data corruption risk. Set it to max expected processing time Γ 1.5.
- At-least-once delivery β consumers must be idempotent. Exam question: "how to prevent duplicate processing?" β use a DynamoDB conditional write to track processed message IDs.
- Long polling β reduces cost and eliminates empty receive calls. Exam scenario: "reduce SQS API costs" β enable long polling (
ReceiveMessage WaitTimeSeconds=20).
SQS's visibility timeout makes retry automatic and safe β if a consumer crashes mid-processing, the message reappears and another consumer picks it up. No message is ever silently lost.
Standard Queue vs FIFO Queue
SQS offers two fundamentally different queue types. Standard is the default and covers ~90% of use cases. FIFO adds strict ordering and exactly-once delivery but with throughput limits. Most production systems use Standard queues.
Standard Queue
- Near-unlimited throughput β millions of messages/sec
- Best-effort ordering β messages may arrive out of order
- At-least-once delivery β occasional duplicates possible
- Lower latency, higher availability
- Use when: order doesn't matter, duplicates are handled
FIFO Queue
- 3,000 msg/sec (300 without batching)
- Strict ordering β first-in-first-out guaranteed
- Exactly-once processing β no duplicates in 5-min window
- Slightly higher latency
- Use when: order matters, duplicates are unacceptable
| Feature | Standard Queue | FIFO Queue |
|---|---|---|
| Throughput | Nearly unlimited | 3,000 msg/sec (with batching) |
| Message ordering | Best-effort (not guaranteed) | Strict FIFO within message group |
| Delivery guarantee | At-least-once (can duplicate) | Exactly-once (within 5-min window) |
| Deduplication | None β consumer must handle | Content-based or ID-based |
| Queue name | Any name | Must end with .fifo |
| Message groups | Not applicable | Required β orders messages within group |
| Cost | $0.40 per million requests | $0.50 per million requests |
| Use cases | Log processing, fan-out, async jobs | Financial transactions, inventory updates |
Use Standard When
- Order doesn't matter β email sending, image thumbnails
- You need massive throughput β millions of messages
- Your consumer is idempotent β same message twice = same result
- Cost matters β Standard is 20% cheaper
- You're doing fan-out to multiple independent workers
Use FIFO When
- Order is critical β transaction ledgers, command sequences
- Duplicates are unacceptable β payment processing
- You need exactly-once for compliance reasons
- Throughput is under 3,000 msg/sec
- You have distinct message groups (e.g., per-customer)
FIFO doesn't mean all messages are processed one at a time globally. You can have multiple message groups, and each group is ordered independently. Messages from different groups can be processed in parallel.
FIFO queues automatically discard duplicate messages within a 5-minute deduplication window. Two methods available:
| Method | How It Works | When to Use |
|---|---|---|
| Explicit deduplication ID | You provide a unique ID with each message | You control ID generation (idempotency keys, request IDs) |
| Content-based deduplication | SHA-256 hash of message body (not attributes) | Simpler setup, body uniquely identifies message |
5-Minute Window
- Same deduplication ID within 5 min β duplicate discarded
- After 5 minutes, same ID is accepted (new window)
- SQS returns success (silent deduplication β no error)
Important Gotchas
- Content-based dedup ignores message attributes β only body
- Different attributes + same body = still duplicate
- Retry after 5 min β message re-delivered (plan for this)
π Best practice: For order confirmations, use deduplication ID = "order-12345-confirmation". This prevents duplicate emails within 5 minutes. If you need longer deduplication, track processed IDs in DynamoDB.
- "Strict ordering required" β FIFO queue with message group ID
- "Exactly-once processing" β FIFO queue with deduplication ID
- "High throughput + async" β Standard queue + idempotent consumer
- FIFO limitation: max 3,000 msg/sec with batching (300 without). If you need more, use Standard.
- FIFO name: must end with
.fifosuffix β e.g.,orders.fifo
Standard queue for 90% of workloads β high throughput, handle duplicates in your consumer. FIFO queue when ordering or exactly-once is a hard requirement β but accept the 3,000 msg/sec limit.
SQS Architecture Patterns
The most fundamental pattern: put a queue between a variable-rate producer and a fixed-rate consumer. The queue absorbs traffic spikes so the backend processes at a steady pace. This is the solution to every "traffic spike crashes our service" problem.
Multiple consumers (workers) poll the same queue in parallel. Each message is processed by exactly one worker. Scale the worker pool based on queue depth β the ApproximateNumberOfMessages CloudWatch metric.
A DLQ catches "poison" messages that fail repeatedly. After N failed processing attempts, SQS automatically moves the message to the DLQ. This prevents a single bad message from blocking your entire queue and consuming infinite retry compute.
- Always in production β no exceptions
- Alert on DLQ message count > 0
- Inspect failed messages for debugging
- Redrive to main queue after fix
maxReceiveCount: failures before DLQ (e.g., 3)- DLQ must be same type (StandardβStandard, FIFOβFIFO)
- Set DLQ retention longer (14 days) for analysis
Replace direct service-to-service HTTP calls with queue-based async messaging. Services communicate through queues instead of knowing about each other. This is the foundation of event-driven microservice architecture.
Tightly Coupled (HTTP)
- Order Service calls Inventory via HTTP
- If Inventory is slow β Order is slow
- If Inventory is down β Order fails
- Scaling Inventory requires rebalancing
- Adding Shipping requires changing Order
Decoupled (SQS)
- Order publishes "order.placed" to queue
- Inventory polls queue at own pace
- If Inventory is down β messages wait
- Scale Inventory independently
- Add Shipping by subscribing to queue
SQS doesn't have native priority support. Implement it with multiple queues β high-priority, normal-priority, low-priority. Configure your consumer to poll high first, then normal, then low.
High Priority
Critical alerts, VIP customers, payment failures. Consumer checks this queue first on every poll cycle.
Normal Priority
Standard workload. Consumer checks after high queue is empty or quota reached.
Low Priority
Batch jobs, reports, cleanup tasks. Processed only when higher queues are empty.
Need async processing but also need to return a response? Use two queues β request queue + response queue. The caller sends a request and waits on its own reply queue.
Key Components
- Correlation ID: UUID that ties request to response
- Reply-to queue: Included in request message
- Long polling: Caller waits on reply queue
- Timeout: Caller can timeout and retry
When to Use
- Async processing but caller needs result
- Long-running work (>30 seconds)
- Decouple request from response latency
- Alternative: AWS Step Functions for orchestration
- "Scale backend based on queue" β Use Auto Scaling with
ApproximateNumberOfMessagesmetric - "Messages failing repeatedly" β Configure Dead-Letter Queue with
maxReceiveCount - "Decouple microservices" β SQS between services instead of HTTP calls
- "Process orders in priority" β Multiple queues (high/normal/low) with weighted polling
SQS has a pattern for every distributed system challenge: load leveling for spikes, worker pools for throughput, DLQs for resilience, and multiple queues for priority β master these and you can architect any async system
SQS + SNS β The Fan-Out Pattern
SNS and SQS are not competitors β they're complementary. SNS broadcasts (one event β many subscribers). SQS buffers (store and process at own pace). Combined, you get the best of both: reliable fan-out to multiple independent consumers, each with their own buffer and retry capability.
SNS Alone
- Broadcasts to multiple subscribers
- Push-based β immediate delivery
- If subscriber is down β message lost
- If subscriber is slow β backed up
- Good for: real-time alerts, Lambda triggers
SQS Alone
- Single queue β single consumer (or pool)
- Pull-based β consumer controls pace
- Messages survive consumer downtime
- Consumer processes at own rate
- Good for: workload processing, jobs
π SNS + SQS = Fan-out with durability. SNS broadcasts to multiple SQS queues. Each queue buffers independently. Each consumer processes at its own pace. One slow consumer doesn't affect others. One down consumer catches up when it recovers.
| Benefit | How Fan-Out Delivers It |
|---|---|
| Isolation | One slow or failed consumer doesn't affect others. Email being slow doesn't delay Inventory updates. |
| Independent scaling | Scale each consumer based on its own queue depth. Email might have 2 workers, Analytics 10. |
| No message loss | If a consumer is down, its queue buffers messages. Catches up when recovered. |
| Add consumers easily | Subscribe a new SQS queue to the SNS topic. No changes to the producer. |
| Different processing speeds | Email (fast, 100/sec) and Video Transcode (slow, 2/sec) work from the same event stream. |
Event Published
Order Service publishes order.placed to SNS topic order-events. Returns immediately β doesn't know or care who subscribes.
Email Queue
Receives event β Lambda sends confirmation email. Fast β 500 emails/sec. Small queue, clears quickly.
Inventory Queue
Receives event β EC2 worker updates stock DB. Medium speed, complex logic. 5 workers in ASG.
Analytics Queue
Receives event β Lambda writes to data lake. Batch processing β runs hourly. Queue grows, drains in batches.
- "One event, multiple consumers, each at own pace" β SNS + SQS fan-out
- "Decouple event producer from consumers" β SNS topic, consumers subscribe queues
- "Consumer failures shouldn't affect others" β Each consumer has its own SQS queue
- This is the #1 integration pattern for AWS microservices β expect it on every exam
SNS + SQS fan-out is the gold-standard architecture for event-driven systems β SNS broadcasts, SQS buffers, consumers stay isolated and independently scalable
SQS vs SNS vs EventBridge vs Kafka
AWS has multiple messaging services because they solve different problems. They're not competitors β they complement each other. Understanding when to use which is a core architecture skill.
SQS β Queue
- Job: Buffer and decouple workloads
- Model: Pull (consumer polls)
- Consumers: One queue β one consumer (or pool)
- Use: Async processing, load leveling
SNS β Broadcast
- Job: Fan-out events to many subscribers
- Model: Push (SNS delivers)
- Consumers: One topic β many subscribers
- Use: Notifications, alerts, pub-sub
EventBridge β Router
- Job: Route events with complex rules
- Model: Push (EventBridge delivers)
- Consumers: Rule-based routing to targets
- Use: Event-driven architecture, SaaS integrations
Kafka (MSK) β Stream
- Job: High-throughput event streaming
- Model: Pull (consumer reads from log)
- Consumers: Multiple consumer groups, replay
- Use: Real-time analytics, log aggregation
| Feature | SQS | SNS | EventBridge | Kafka (MSK) |
|---|---|---|---|---|
| Primary use | Buffer workloads | Broadcast events | Route events | Stream events |
| Delivery model | Pull (poll) | Push | Push | Pull (read log) |
| Message retention | Up to 14 days | None (immediate) | None (immediate) | Configurable (daysβforever) |
| Message replay | No | No | Archive β replay | Yes (offset-based) |
| Throughput | Near-unlimited | Near-unlimited | 10K events/sec (soft limit) | Millions/sec |
| Ordering | FIFO queue option | FIFO topic option | No guarantee | Per-partition ordering |
| Content filtering | No | Subscription filters | Rich rule patterns | Consumer logic |
| Management | Fully managed | Fully managed | Fully managed | Managed (MSK) or self-managed |
| Cost model | Per request + data | Per request + data | Per event | Per broker-hour + storage |
Use SQS When
- You need to buffer workloads (jobs, tasks)
- Consumer needs to process at its own pace
- You need retry + DLQ for failed messages
- Single consumer (or competing consumer pool)
- Messages can be deleted after processing
Use SNS When
- One event needs to reach multiple subscribers
- You want push delivery (immediate)
- Subscribers are Lambda, HTTP, Email, SMS
- Simple pub-sub pattern
- Combined with SQS for durable fan-out
Use EventBridge When
- You need content-based routing rules
- You're integrating with SaaS (Zendesk, Datadog)
- You want schema registry + discovery
- You're building event-driven architecture
- You need to archive and replay events
Use Kafka (MSK) When
- You need millions of events per second
- Multiple consumers need to read same stream
- You need message replay / reprocessing
- You're doing real-time analytics / ML
- You already have Kafka expertise
| Pattern | Services Used | Why |
|---|---|---|
| Durable fan-out | SNS + SQS | SNS broadcasts, SQS buffers per-consumer |
| Event-driven microservices | EventBridge + SQS | EventBridge routes, SQS buffers processing |
| Real-time + batch | Kafka + S3 + Athena | Kafka streams, S3 stores, Athena queries |
| SaaS integration | EventBridge + Lambda | EventBridge receives SaaS events, Lambda processes |
| Transactional + analytics | SQS + Kinesis | SQS for transactions, Kinesis for analytics stream |
Both are pull-based, but they serve fundamentally different purposes. This is a common source of confusion:
| Feature | SQS | Kinesis Data Streams |
|---|---|---|
| Primary use | Work queue, task processing | Real-time streaming analytics |
| Data model | Messages (deleted after processing) | Persistent log (retention 1-365 days) |
| Replay capability | No β message gone after delete | Yes β replay from any offset |
| Multiple consumers | Competing consumer (one gets message) | Multiple consumer groups (all get all data) |
| Message size | 256 KB | 1 MB |
| Ordering | FIFO queue option | Per-partition ordering |
| Throughput scaling | Auto-scales | Partition/shard scaling (manual) |
| Retention | Up to 14 days | Up to 365 days |
| Cost model | Per request | Per shard-hour + data |
Use SQS When
- You have a queue of work/tasks to process
- Message can be deleted after successful processing
- One consumer (or competing pool) per message
- You need retry + DLQ for failures
Use Kinesis When
- Multiple consumers need to read the same stream
- You need to replay / reprocess historical events
- Real-time analytics, ML, dashboards
- Audit logs requiring long retention
π They work together: Kinesis for ingestion + SQS for work distribution. Pattern: Kinesis β Lambda β SQS β worker pool. Kinesis handles high-throughput ingestion, SQS provides reliable per-item processing.
- Need to buffer work for later processing? β SQS
- One event, many consumers immediately? β SNS (or SNS + SQS for durability)
- Complex event routing rules? β EventBridge
- SaaS integrations? β EventBridge (has native connectors)
- Real-time streaming at massive scale? β Kafka (MSK) or Kinesis
- Need to replay events? β Kafka (permanent log) or EventBridge Archive
- "Decouple services, buffer requests" β SQS
- "Fan-out to multiple consumers" β SNS (or SNS + SQS)
- "Route events based on content" β EventBridge
- "Real-time analytics, millions/sec" β Kinesis Data Streams or MSK (Kafka)
- "Integrate with third-party SaaS" β EventBridge (has partner sources)
- These services complement each other β combinations are common and expected
SQS = buffer, SNS = broadcast, EventBridge = route, Kafka = stream. They solve different problems and often work together β choose based on your specific pattern, not as competitors.
Security, Reliability & Scaling
SQS access is controlled by two mechanisms: IAM policies (attached to users/roles) and SQS queue policies (attached to queues). Both must allow an action for it to succeed.
IAM Policy
- Attached to IAM user, role, or group
- Controls what that identity can do
- "Role X can send to any queue in account"
- Use for: same-account access, EC2/Lambda roles
Queue Policy
- Attached to the queue itself
- Controls who can access this queue
- "Allow Account B to send to this queue"
- Use for: cross-account access, AWS service access
| Scenario | Use IAM Policy | Use Queue Policy |
|---|---|---|
| Lambda in same account sends to queue | Yes β attach to Lambda role | Not required |
| Another AWS account sends to your queue | Not sufficient alone | Yes β must allow principal |
| SNS topic sends to queue | Not required | Yes β allow SNS service |
| S3 event sends to queue | Not required | Yes β allow S3 service |
| Restrict which queues a role can access | Yes β specify queue ARN | Not the right tool |
Encryption at Rest (SSE)
- Enable Server-Side Encryption (SSE)
- AWS managed key (
SSE-SQS) β free, automatic - Customer managed key (
SSE-KMS) β more control - Messages encrypted when stored in SQS
- Decrypted transparently when received
Encryption in Transit
- SQS API uses HTTPS by default
- TLS 1.2+ for all connections
- No configuration required
- For extra security: add IAM condition
aws:SecureTransport
SQS message size limit is 256 KB. For larger payloads, use the SQS Extended Client (AWS SDK library). It automatically stores large payloads in S3 and puts only a reference in SQS.
How It Works
- Large payload (>256 KB) β stored in S3
- SQS message contains S3 reference (
s3://bucket/key) - Consumer client retrieves from S3 automatically
- Messages up to 2 GB (S3 limit)
Limitations
- Not supported for FIFO queues
- Additional S3 cost (storage + GET/PUT)
- Java, Python, Node.js SDKs supported
- Alternative: store in S3 manually, send URI in message
By default, SQS API calls go over the public internet. For EC2/Lambda in private subnets with no NAT, create a VPC Interface Endpoint for SQS. Traffic stays within AWS network.
Without VPC Endpoint
- Traffic goes via Internet Gateway or NAT
- Private subnet workloads need NAT Gateway
- NAT adds cost and is a throughput bottleneck
With VPC Endpoint
- Traffic stays within AWS network
- No Internet Gateway or NAT required
- Lower latency, higher security
- Cost: ~$0.01/hr per AZ + data fees
| Setting | Default | Range | Notes |
|---|---|---|---|
| Message retention | 4 days | 1 minute β 14 days | Messages deleted after this period if not processed |
| Visibility timeout | 30 seconds | 0 β 12 hours | How long message is hidden during processing |
| Message size | β | 1 byte β 256 KB | For larger payloads, store in S3 and send pointer |
| Delay queue | 0 seconds | 0 β 15 minutes | Messages invisible for this period after send |
| Receive wait time | 0 seconds | 0 β 20 seconds | Long polling wait time (set to 20 for efficiency) |
The best way to scale SQS consumers is based on queue depth β the number of messages waiting. Use CloudWatch metric ApproximateNumberOfMessages to trigger Auto Scaling.
Key CloudWatch Metrics
ApproximateNumberOfMessagesβ messages waitingApproximateNumberOfMessagesNotVisibleβ in-flightApproximateAgeOfOldestMessageβ queue lagNumberOfMessagesReceivedβ throughputNumberOfMessagesSentβ producer rate
Auto Scaling Strategy
- Target tracking: "keep backlog per instance at 1000"
- Scale out when:
ApproximateNumberOfMessages / DesiredCapacity > 1000 - Scale in when: backlog cleared
- Use
ApproximateAgeOfOldestMessagefor SLA alarms
π Best practice formula: Target = (Acceptable latency in seconds) Γ (Messages processed per second per instance). If each instance processes 10 msg/sec and you want max 60s latency, target = 600 messages per instance.
Lambda can poll SQS automatically via Event Source Mapping. No need for EC2 workers. Lambda scales automatically based on queue depth.
Lambda + SQS Benefits
- No infrastructure to manage
- Auto-scales with queue depth
- Pay only for invocations
- Built-in retry + DLQ support
- Processes up to 10 messages per batch
Lambda + SQS Limits
- Max 15 min execution time (per message)
- 1000 concurrent executions default (can increase)
- FIFO queue: max 10 concurrent batches per group
- Cold starts add latency on scale-out
- Not ideal for very long-running jobs
| Alarm | Metric | Threshold Example | Why |
|---|---|---|---|
| Queue backlog | ApproximateNumberOfMessages | > 10,000 for 5 min | Consumers falling behind |
| Processing lag | ApproximateAgeOfOldestMessage | > 300 sec | SLA violation risk |
| DLQ messages | ApproximateNumberOfMessages (DLQ) | > 0 | Messages failing repeatedly |
| Empty receives | NumberOfEmptyReceives | > 1000/min | Enable long polling |
SQS pricing is simple: $0.40 per 1M requests for Standard, $0.50 per 1M for FIFO. Batching and long polling are the main optimization levers.
| Workload | Messages/Day | Without Batching | With Batching |
|---|---|---|---|
| Small e-commerce | 1,000 orders | $0.0008/day | $0.00008/day |
| Medium app | 10,000 msg | $0.008/day | $0.0008/day |
| Large scale | 10M msg | $8/day ($240/mo) | $0.80/day ($24/mo) |
| FIFO | 100K msg | $0.10/day | $0.01/day |
Cost Optimization Checklist
- β Use batch APIs β 10x cost reduction
- β Enable long polling (WaitTimeSeconds=20)
- β Same region for sender and consumer
- β Delete unused queues
- β Monitor with Cost Explorer (filter: "Requests")
Data Transfer Notes
- SQS β Lambda (same region): free
- SQS β EC2 (same region): free
- Cross-region: ~$0.02/GB (avoid if possible)
- SQS β Internet: ~$0.09/GB
Enable SSE
Always enable encryption at rest. Use SSE-SQS (free) or SSE-KMS (for compliance). No excuse for unencrypted queues.
Least Privilege
Grant only needed actions: sqs:SendMessage for producers, sqs:ReceiveMessage + sqs:DeleteMessage for consumers.
Enable Logging
Use CloudTrail to log all SQS API calls. Monitor for unexpected access patterns or unauthorized attempts.
- "Cross-account queue access" β Requires queue policy (not just IAM)
- "SNS publishes to SQS" β Queue policy must allow
sns.amazonaws.com - "Encrypt messages at rest" β Enable SSE-SQS or SSE-KMS
- "Private subnet access to SQS" β VPC Interface Endpoint
- "Scale consumers on queue size" β Auto Scaling on
ApproximateNumberOfMessages - "Reduce SQS costs" β Enable long polling (WaitTimeSeconds=20)
SQS is designed for production: enable SSE for encryption, use queue policies for cross-account/service access, scale consumers on queue depth, and always configure a DLQ. Monitor ApproximateAgeOfOldestMessage for SLA compliance.
- SQS is a managed message queue β producers enqueue, consumers poll and process at their own pace
- Core model: Send β Store (up to 14 days) β Poll β Process β Delete
- Visibility timeout β message hidden during processing; returns to queue if not deleted
- Standard queue β near-unlimited throughput, best-effort order, at-least-once delivery
- FIFO queue β strict ordering, exactly-once, 3,000 msg/sec limit
- Dead-Letter Queue β catches messages that fail repeatedly; essential in production
- SNS + SQS fan-out β SNS broadcasts, each consumer has its own SQS buffer
- Security β IAM + queue policies, SSE encryption, VPC endpoints for private access
- Scaling β Auto Scale on
ApproximateNumberOfMessages; Lambda event source mapping for serverless - vs other services: SQS = buffer, SNS = broadcast, EventBridge = route, Kafka = stream