LearningTree · AWS · Integration

Amazon SQS —
Simple Queue Service

A fully managed message queue that decouples distributed systems — producers drop work in, consumers process it on their own terms. No direct connections, no cascading failures, no traffic spikes crashing your services.

🗂️ SQS in 30 Seconds

Managed message queue — producers enqueue, consumers dequeue and process independently
Pull-based (polling) — consumers ask for messages at their own pace, unlike SNS push
Messages stored durably for up to 14 days — survive consumer downtime
Standard: near-unlimited throughput, best-effort order · FIFO: strict order, exactly-once, 3,000 msg/s
Dead-Letter Queue (DLQ) catches messages that fail repeatedly — essential for production

Chapter One

What is Amazon SQS

The Problem: Systems That Talk Directly Introductory

In a naive microservices architecture, every service calls other services directly. The Order API calls the Email Service, the Inventory Service, and the Shipping Service — all synchronously, all in real time. This causes three serious problems:

💥

Traffic Spikes Crash Services

A flash sale sends 100× normal orders. The Order API overwhelms the Inventory Service — which can't scale fast enough. Requests fail. Orders are lost.

🌊

Failures Cascade

Email Service goes down for 2 minutes. Every order fails — even though the warehouse is running fine. One broken service breaks everything downstream.

🔗

Tight Coupling

Adding a new fraud detection service means changing the Order API and redeploying. Every new consumer makes the producer more complex.

The Mental Model: A Warehouse Receiving Area Introductory

👉 SQS is like a warehouse receiving dock. Delivery trucks (producers) drop off packages at any time. The dock holds them safely. Warehouse workers (consumers) pick up packages when they're ready — at their own pace. If the workers are busy, packages wait. Nothing is lost. Nobody blocks waiting for each other.

More everyday analogies:

🎫

Ticket Queue

Customers take a number and wait. Service agents handle one at a time. A rush of customers doesn't overwhelm agents — it just lengthens the queue temporarily.

📬

Mailbox

The postman drops letters in your mailbox regardless of whether you're home. You read them when you're ready. The postman doesn't wait for you.

🏭

Assembly Line Buffer

Parts accumulate on a conveyor between two stations. Station B processes at its own speed. Station A never blocks waiting for B to be free.

What SQS Actually Is Introductory

Amazon SQS (Simple Queue Service) is a fully managed message queue that enables asynchronous communication between distributed components. The core model:

Producers send messages into a queue
Messages are stored durably until a consumer processes them
Consumers poll the queue and process messages at their own rate
Once processed successfully, the consumer deletes the message from the queue

Core Concept Diagram Introductory

SQS — Producer → Queue → Consumer

① PRODUCER

Publishes a message to the queue and returns immediately — no waiting

② QUEUE

Stores message durably (up to 14 days). Survives consumer downtime

③ CONSUMER

Polls queue at own pace, processes one message, then deletes it

④ DELETE

Message deleted only after successful processing — if it fails, the message returns

SQS vs Direct Service Calls Core

Concern	Direct Synchronous Call	SQS Queue
Traffic spike	Service overwhelmed, requests dropped	Messages buffer in queue — consumer processes steadily
Consumer downtime	Calls fail, data is lost	Messages wait safely for up to 14 days
Consumer speed	Producer must wait for response	Producer returns instantly — no waiting
Adding new consumer	Change producer code, redeploy	Point new consumer at the queue
Retry on failure	Manual retry logic needed	Built-in visibility timeout + DLQ
Scaling	Producer and consumer must scale together	Consumer scales independently based on queue depth

Queue Processing Flow Core

Queue-Based Processing — Smoothing a Traffic Spike

✗ WITHOUT SQS

Traffic spike hits the service directly — it has no buffer, gets overwhelmed, requests are dropped

✓ WITH SQS

Spike enters the queue. Worker processes at a steady, sustainable rate. Queue drains over time — no data lost

Why This Matters — The Three Superpowers of Queues Core

🧱

Buffering

The queue holds messages during traffic spikes, consumer slowdowns, and deployments. Work is never lost — it waits safely until capacity is available.

🔄

Retry Capability

If processing fails, the message returns to the queue automatically. Built-in retry logic means transient failures are handled without custom code.

📐

Independent Scaling

Consumers scale based on queue depth rather than producer rate. Add more workers when the queue grows — completely independent of the producer.

🎓 Exam Insight

When an exam question mentions "decouple services", "handle traffic spikes", "buffer requests", or "asynchronous processing" — the answer is an SQS queue. SQS is the AWS answer to workload isolation and async processing, while SNS is the answer to broadcasting events to many consumers.

👉 Key Takeaway

SQS breaks the direct dependency between producers and consumers — work is stored durably in a queue and processed reliably, regardless of traffic spikes or consumer failures

Chapter Two

Why Distributed Systems Need Queues

The Fundamental Problem: Synchronous Systems Don't Scale Introductory

Every distributed system eventually faces the same question: what happens when two services need to communicate but operate at different speeds, different scales, or different availability levels? Synchronous direct calls work fine at small scale. They break catastrophically at large scale.

⚡

Speed Mismatch

Service A can produce 10,000 events/sec. Service B can process 500/sec. Without a queue, 9,500 events per second are either dropped or Service A must slow down — both unacceptable in production.

🕐

Availability Mismatch

Service B deploys every Tuesday. Service A cannot stop accepting user requests for 3 minutes while B restarts. With direct calls, A's availability is limited by B's availability.

📈

Scale Mismatch

Service A auto-scales to 50 instances during peak. Service B can only handle 5x load. Direct calls flood B during spikes — B crashes, which cascades back to A and the entire system fails.

🔁

Retry Complexity

When Service B is temporarily down, Service A must implement exponential backoff, retry logic, circuit breakers — all custom code. Every service pair adds more complexity.

👉 Queues solve all four problems simultaneously. They act as a shock absorber between services — absorbing speed mismatches, surviving availability gaps, smoothing scale spikes, and eliminating the need for custom retry logic.

Real-World Workloads That Require Queues Core

🛒

E-Commerce Order Processing

Orders arrive in bursts (flash sales, promotions)
Inventory, email, shipping, fraud all need to react
Queue absorbs the burst — all downstream systems stay stable
Pattern: Order Service → SQS → multiple workers

🎬

Video Processing Pipeline

User uploads video — transcoding takes 30 seconds
Can't make the user wait synchronously
Upload → SQS → transcoding worker → CDN publish
Pattern: Fan-out to multiple resolution workers

💳

Payment Processing

Payment accepted instantly, settlement is async
Fraud check, bank transfer, receipt — all non-blocking
FIFO queue ensures transaction ordering
DLQ captures failed transactions for manual review

📧

Email / Notification Systems

Sending 1M emails takes minutes — never synchronous
SQS buffers all send requests
Email workers scale based on queue depth
Failed sends retry automatically via visibility timeout

How Queues Help Systems "Slow Down Safely" Core

One of the most underrated queue properties: a queue lets a system slow down without losing work. This is not possible with synchronous calls. Without a queue, when a service is overwhelmed it drops requests. With a queue, work accumulates and drains as capacity becomes available.

Workload Buffering — Queue Absorbs Spikes and Drains Smoothly

INCOMING WORK

Arrives in bursts — spiky, unpredictable, can peak at 50–100× baseline during sales

QUEUE DEPTH

Grows during the spike, then gradually drains as the consumer catches up. Nothing is lost.

CONSUMER

Processes at a steady, predictable rate. Can scale out if queue depth grows too large.

Five Benefits Every Architect Knows Core

Benefit	What It Means in Practice
Workload buffering	Message queue absorbs traffic spikes. Consumer processes at its own pace. No dropped requests.
Decoupling	Producer doesn't know about consumer. Add, replace, or scale consumers without touching the producer.
Retry handling	Failed messages return to queue automatically. No custom retry code in your services.
Fault tolerance	Consumer downtime doesn't cause data loss. Messages wait. System resumes where it left off.
Independent scaling	Scale consumers based on queue depth metric. Auto Scaling reacts to backlog, not to producer rate.

Common Architectural Mistakes Core

❌

Processing in the producer

Doing heavy work (DB writes, API calls) in the producer before queuing — defeats the purpose. The producer should enqueue and return immediately. Heavy lifting belongs in the consumer.

❌

Not handling duplicates

Standard queues deliver at-least-once, meaning occasionally a message arrives twice. Consumers that don't handle this idempotently can process orders twice, send two emails, charge twice.

❌

Visibility timeout too short

If the consumer takes 10 seconds to process but timeout is 5 seconds, the message becomes visible again while the first consumer is still working — causing duplicate processing.

❌

No Dead-Letter Queue

Without a DLQ, a "poison" message that always fails processing will loop forever, blocking the queue and consuming all your compute in failed retries.

🎓 Exam Insight

Common exam pattern: "An application receives variable traffic — 10 requests/sec normally, 10,000/sec during promotions. The processing backend can only handle 50 req/sec max. How do you architect this?" Answer: SQS queue between the frontend and backend. The queue absorbs the spike; the backend processes at 50 req/sec; requests are never dropped. Scale the backend using the SQS ApproximateNumberOfMessages metric in Auto Scaling.

👉 Key Takeaway

Queues decouple the rate of work arrival from the rate of work processing — enabling services to operate independently, survive each other's failures, and scale without coordination

Chapter Three

How SQS Works

Pull vs Push — Why SQS is Pull-Based Introductory

SQS uses a polling (pull) model — the consumer actively asks "do you have messages for me?" at regular intervals. This is the opposite of SNS which pushes messages to subscribers. The pull model gives the consumer full control over its processing rate.

📤

Push (SNS) — Producer controls pace

SNS delivers immediately to all subscribers
Consumer must handle any rate it receives
Good for broadcasting events to many subscribers
Consumer can be overwhelmed during spikes

📥

Pull (SQS) — Consumer controls pace

Consumer decides when to ask for messages
Consumer processes at its own maximum rate
Good for workload processing at controlled speed
Queue absorbs backlog when consumer is slow

Message Lifecycle — 5 Stages Core

SQS Message Lifecycle

① SEND

Producer puts message in queue. Returns immediately. Max 256 KB.

② STORE

Message is visible and waiting. Any consumer can pick it up.

③ POLL

Consumer requests messages. SQS returns up to 10 at a time.

④ PROCESS

Message becomes invisible to others during processing (visibility timeout).

⑤ DELETE

Consumer must explicitly delete. If not deleted → message reappears for retry.

Visibility Timeout — The Key Safety Mechanism Core

When a consumer receives a message, SQS doesn't delete it immediately. Instead it makes the message invisible to all other consumers for a configurable period — the visibility timeout. This is SQS's built-in retry mechanism.

Visibility Timeout — Success Path vs Failure Path

VISIBILITY TIMEOUT

Default: 30 seconds. Range: 0 – 12 hours
Set it longer than your worst-case processing time
Too short → duplicates; Too long → slow retries on crash
Consumer can extend it via ChangeMessageVisibility API while still working

AT-LEAST-ONCE DELIVERY

Standard queue: messages may be delivered more than once
Consumers must be idempotent — same message twice = same result
Use a unique message ID or database upsert to handle duplicates
FIFO queue provides exactly-once processing within 5-minute window

Long Polling vs Short Polling Core

By default, SQS uses short polling — it samples a subset of servers and returns immediately, even if the queue is empty. Long polling waits up to 20 seconds for a message before returning. Always use long polling in production.

Feature	Short Polling	Long Polling (recommended)
Wait time	Returns immediately (0s)	Waits up to 20s for a message
Empty responses	Many — wastes API calls	Minimal — only returns when message arrives
Cost	Higher — many empty polls billed	Lower — fewer API requests
Latency	Near-zero when queue is active	Near-zero when message available; waits only when empty
How to enable	Default	Set `WaitTimeSeconds=20`

Message Batching — Cost & Throughput Optimization In-Depth

Batching sends multiple messages in a single API request instead of one at a time. This is one of the most important cost optimization techniques — it reduces API calls by up to 90%.

Aspect	Single Send	Batch Send
API calls per 10 messages	10 calls	1 call (90% reduction)
Cost per 1M messages	$0.40	$0.04 (90% reduction)
Max messages per batch	1	10
Max batch size	256 KB per message	256 KB total across batch

📤

SendMessageBatch

Send up to 10 messages in one request. Each can have different body, attributes, and delay.

🗑️

DeleteMessageBatch

Delete up to 10 messages in one request. Pass receipt handles from processing.

📥

ReceiveMessage

Already returns up to 10 messages per poll. Set MaxNumberOfMessages=10.

👉 Cost example: With 10M messages/day — without batching: $4/day. With batching: $0.40/day. Savings: $109/month. Always batch when possible.

ChangeMessageVisibility — Extending Processing Time In-Depth

What if processing time varies? Some messages take 2 seconds, one takes 30 seconds. Use ChangeMessageVisibility API to extend the timeout while processing.

⏱️

The Problem

Timeout too short → message reappears mid-processing → duplicate work
Timeout too long → if consumer dies, message waits unnecessarily
Variable processing time → no single timeout fits all

✅

The Solution

Start with timeout = expected time × 1.5
During processing, periodically check remaining time
If remainingTime < 30%, call ChangeMessageVisibility
Maximum visibility timeout: 12 hours

Message Attributes — Metadata Without Touching the Body In-Depth

Message attributes are key-value pairs attached to a message, separate from the body. Use them for routing, filtering, and tagging without parsing JSON.

📋

Use Cases

Routing: Which service should process this?
Priority: Process high-priority first
Source tracking: Which system sent this?
Versioning: Schema version for consumers

📝

Supported Types

String — text values
Number — integers, floats
Binary — base64-encoded data
Custom type IDs (e.g., "image/jpeg")

🎓 Exam Insight

Visibility timeout too short → same message processed by two different consumers simultaneously → data corruption risk. Set it to max expected processing time × 1.5.
At-least-once delivery → consumers must be idempotent. Exam question: "how to prevent duplicate processing?" → use a DynamoDB conditional write to track processed message IDs.
Long polling → reduces cost and eliminates empty receive calls. Exam scenario: "reduce SQS API costs" → enable long polling (ReceiveMessage WaitTimeSeconds=20).

👉 Key Takeaway

SQS's visibility timeout makes retry automatic and safe — if a consumer crashes mid-processing, the message reappears and another consumer picks it up. No message is ever silently lost.

Chapter Four

Standard Queue vs FIFO Queue

Two Queue Types — Choose Based on Your Needs Introductory

SQS offers two fundamentally different queue types. Standard is the default and covers ~90% of use cases. FIFO adds strict ordering and exactly-once delivery but with throughput limits. Most production systems use Standard queues.

🚀

Standard Queue

Near-unlimited throughput — millions of messages/sec
Best-effort ordering — messages may arrive out of order
At-least-once delivery — occasional duplicates possible
Lower latency, higher availability
Use when: order doesn't matter, duplicates are handled

📋

FIFO Queue

3,000 msg/sec (300 without batching)
Strict ordering — first-in-first-out guaranteed
Exactly-once processing — no duplicates in 5-min window
Slightly higher latency
Use when: order matters, duplicates are unacceptable

Detailed Comparison Core

Feature	Standard Queue	FIFO Queue
Throughput	Nearly unlimited	3,000 msg/sec (with batching)
Message ordering	Best-effort (not guaranteed)	Strict FIFO within message group
Delivery guarantee	At-least-once (can duplicate)	Exactly-once (within 5-min window)
Deduplication	None — consumer must handle	Content-based or ID-based
Queue name	Any name	Must end with `.fifo`
Message groups	Not applicable	Required — orders messages within group
Cost	$0.40 per million requests	$0.50 per million requests
Use cases	Log processing, fan-out, async jobs	Financial transactions, inventory updates

When to Use Standard vs FIFO Core

✅

Use Standard When

Order doesn't matter — email sending, image thumbnails
You need massive throughput — millions of messages
Your consumer is idempotent — same message twice = same result
Cost matters — Standard is 20% cheaper
You're doing fan-out to multiple independent workers

✅

Use FIFO When

Order is critical — transaction ledgers, command sequences
Duplicates are unacceptable — payment processing
You need exactly-once for compliance reasons
Throughput is under 3,000 msg/sec
You have distinct message groups (e.g., per-customer)

FIFO Message Groups — Parallelism Within Order In-Depth

FIFO doesn't mean all messages are processed one at a time globally. You can have multiple message groups, and each group is ordered independently. Messages from different groups can be processed in parallel.

FIFO Message Groups — Parallel Processing with Per-Group Order

MESSAGE GROUP ID

Required on every FIFO message. Use customer ID, order ID, or any partition key. Messages with the same group ID are strictly ordered.

DEDUPLICATION ID

Either provide explicitly or enable content-based deduplication. Within 5-minute window, same ID = same message → discarded.

FIFO Deduplication — Exactly-Once Mechanics In-Depth

FIFO queues automatically discard duplicate messages within a 5-minute deduplication window. Two methods available:

Method	How It Works	When to Use
Explicit deduplication ID	You provide a unique ID with each message	You control ID generation (idempotency keys, request IDs)
Content-based deduplication	SHA-256 hash of message body (not attributes)	Simpler setup, body uniquely identifies message

⏱️

5-Minute Window

Same deduplication ID within 5 min → duplicate discarded
After 5 minutes, same ID is accepted (new window)
SQS returns success (silent deduplication — no error)

⚠️

Important Gotchas

Content-based dedup ignores message attributes — only body
Different attributes + same body = still duplicate
Retry after 5 min → message re-delivered (plan for this)

👉 Best practice: For order confirmations, use deduplication ID = "order-12345-confirmation". This prevents duplicate emails within 5 minutes. If you need longer deduplication, track processed IDs in DynamoDB.

🎓 Exam Insight

"Strict ordering required" → FIFO queue with message group ID
"Exactly-once processing" → FIFO queue with deduplication ID
"High throughput + async" → Standard queue + idempotent consumer
FIFO limitation: max 3,000 msg/sec with batching (300 without). If you need more, use Standard.
FIFO name: must end with .fifo suffix — e.g., orders.fifo

👉 Key Takeaway

Standard queue for 90% of workloads — high throughput, handle duplicates in your consumer. FIFO queue when ordering or exactly-once is a hard requirement — but accept the 3,000 msg/sec limit.

Chapter Five

SQS Architecture Patterns

Pattern 1: Queue-Based Load Leveling Core

The most fundamental pattern: put a queue between a variable-rate producer and a fixed-rate consumer. The queue absorbs traffic spikes so the backend processes at a steady pace. This is the solution to every "traffic spike crashes our service" problem.

Queue-Based Load Leveling — Variable In, Steady Out

Pattern 2: Worker Pool Pattern Core

Multiple consumers (workers) poll the same queue in parallel. Each message is processed by exactly one worker. Scale the worker pool based on queue depth — the ApproximateNumberOfMessages CloudWatch metric.

Worker Pool — Multiple Consumers, One Queue

Pattern 3: Dead-Letter Queue (DLQ) Core

A DLQ catches "poison" messages that fail repeatedly. After N failed processing attempts, SQS automatically moves the message to the DLQ. This prevents a single bad message from blocking your entire queue and consuming infinite retry compute.

Dead-Letter Queue — Isolating Failed Messages

WHEN TO USE DLQ

Always in production — no exceptions
Alert on DLQ message count > 0
Inspect failed messages for debugging
Redrive to main queue after fix

CONFIGURATION

maxReceiveCount: failures before DLQ (e.g., 3)
DLQ must be same type (Standard→Standard, FIFO→FIFO)
Set DLQ retention longer (14 days) for analysis

Pattern 4: Microservice Decoupling Core

Replace direct service-to-service HTTP calls with queue-based async messaging. Services communicate through queues instead of knowing about each other. This is the foundation of event-driven microservice architecture.

❌

Tightly Coupled (HTTP)

Order Service calls Inventory via HTTP
If Inventory is slow → Order is slow
If Inventory is down → Order fails
Scaling Inventory requires rebalancing
Adding Shipping requires changing Order

✅

Decoupled (SQS)

Order publishes "order.placed" to queue
Inventory polls queue at own pace
If Inventory is down → messages wait
Scale Inventory independently
Add Shipping by subscribing to queue

Pattern 5: Priority Queue In-Depth

SQS doesn't have native priority support. Implement it with multiple queues — high-priority, normal-priority, low-priority. Configure your consumer to poll high first, then normal, then low.

🔴

High Priority

Critical alerts, VIP customers, payment failures. Consumer checks this queue first on every poll cycle.

🟡

Normal Priority

Standard workload. Consumer checks after high queue is empty or quota reached.

🔵

Low Priority

Batch jobs, reports, cleanup tasks. Processed only when higher queues are empty.

Pattern 6: Request-Reply In-Depth

Need async processing but also need to return a response? Use two queues — request queue + response queue. The caller sends a request and waits on its own reply queue.

Request-Reply Pattern — Async with Response

🔑

Key Components

Correlation ID: UUID that ties request to response
Reply-to queue: Included in request message
Long polling: Caller waits on reply queue
Timeout: Caller can timeout and retry

📋

When to Use

Async processing but caller needs result
Long-running work (>30 seconds)
Decouple request from response latency
Alternative: AWS Step Functions for orchestration

🎓 Exam Insight

"Scale backend based on queue" → Use Auto Scaling with ApproximateNumberOfMessages metric
"Messages failing repeatedly" → Configure Dead-Letter Queue with maxReceiveCount
"Decouple microservices" → SQS between services instead of HTTP calls
"Process orders in priority" → Multiple queues (high/normal/low) with weighted polling

👉 Key Takeaway

SQS has a pattern for every distributed system challenge: load leveling for spikes, worker pools for throughput, DLQs for resilience, and multiple queues for priority — master these and you can architect any async system

Chapter Six

SQS + SNS — The Fan-Out Pattern

Why Combine SNS and SQS? Introductory

SNS and SQS are not competitors — they're complementary. SNS broadcasts (one event → many subscribers). SQS buffers (store and process at own pace). Combined, you get the best of both: reliable fan-out to multiple independent consumers, each with their own buffer and retry capability.

📢

SNS Alone

Broadcasts to multiple subscribers
Push-based — immediate delivery
If subscriber is down → message lost
If subscriber is slow → backed up
Good for: real-time alerts, Lambda triggers

🗂️

SQS Alone

Single queue → single consumer (or pool)
Pull-based — consumer controls pace
Messages survive consumer downtime
Consumer processes at own rate
Good for: workload processing, jobs

👉 SNS + SQS = Fan-out with durability. SNS broadcasts to multiple SQS queues. Each queue buffers independently. Each consumer processes at its own pace. One slow consumer doesn't affect others. One down consumer catches up when it recovers.

The Fan-Out Architecture Core

SNS + SQS Fan-Out — One Event, Multiple Independent Consumers

SNS → BROADCAST

One publish to SNS fans out instantly to all subscribed queues. Producer doesn't know how many consumers exist.

SQS → BUFFER

Each queue buffers independently. If Analytics is down for an hour, its messages wait. Email and Inventory are unaffected.

CONSUMERS → INDEPENDENT

Each consumer scales separately. Email might need 2 workers, Inventory needs 10, Analytics needs 1. No coordination.

Fan-Out Benefits Core

Benefit	How Fan-Out Delivers It
Isolation	One slow or failed consumer doesn't affect others. Email being slow doesn't delay Inventory updates.
Independent scaling	Scale each consumer based on its own queue depth. Email might have 2 workers, Analytics 10.
No message loss	If a consumer is down, its queue buffers messages. Catches up when recovered.
Add consumers easily	Subscribe a new SQS queue to the SNS topic. No changes to the producer.
Different processing speeds	Email (fast, 100/sec) and Video Transcode (slow, 2/sec) work from the same event stream.

Real-World Example: E-Commerce Order Event Core

🔔

Event Published

Order Service publishes order.placed to SNS topic order-events. Returns immediately — doesn't know or care who subscribes.

📧

Email Queue

Receives event → Lambda sends confirmation email. Fast — 500 emails/sec. Small queue, clears quickly.

📦

Inventory Queue

Receives event → EC2 worker updates stock DB. Medium speed, complex logic. 5 workers in ASG.

📊

Analytics Queue

Receives event → Lambda writes to data lake. Batch processing — runs hourly. Queue grows, drains in batches.

🎓 Exam Insight

"One event, multiple consumers, each at own pace" → SNS + SQS fan-out
"Decouple event producer from consumers" → SNS topic, consumers subscribe queues
"Consumer failures shouldn't affect others" → Each consumer has its own SQS queue
This is the #1 integration pattern for AWS microservices — expect it on every exam

👉 Key Takeaway

SNS + SQS fan-out is the gold-standard architecture for event-driven systems — SNS broadcasts, SQS buffers, consumers stay isolated and independently scalable

Chapter Seven

SQS vs SNS vs EventBridge vs Kafka

Four Messaging Services — Different Jobs Introductory

AWS has multiple messaging services because they solve different problems. They're not competitors — they complement each other. Understanding when to use which is a core architecture skill.

🗂️

SQS — Queue

Job: Buffer and decouple workloads
Model: Pull (consumer polls)
Consumers: One queue → one consumer (or pool)
Use: Async processing, load leveling

📢

SNS — Broadcast

Job: Fan-out events to many subscribers
Model: Push (SNS delivers)
Consumers: One topic → many subscribers
Use: Notifications, alerts, pub-sub

🎯

EventBridge — Router

Job: Route events with complex rules
Model: Push (EventBridge delivers)
Consumers: Rule-based routing to targets
Use: Event-driven architecture, SaaS integrations

🚀

Kafka (MSK) — Stream

Job: High-throughput event streaming
Model: Pull (consumer reads from log)
Consumers: Multiple consumer groups, replay
Use: Real-time analytics, log aggregation

Detailed Comparison Core

Feature	SQS	SNS	EventBridge	Kafka (MSK)
Primary use	Buffer workloads	Broadcast events	Route events	Stream events
Delivery model	Pull (poll)	Push	Push	Pull (read log)
Message retention	Up to 14 days	None (immediate)	None (immediate)	Configurable (days–forever)
Message replay	No	No	Archive → replay	Yes (offset-based)
Throughput	Near-unlimited	Near-unlimited	10K events/sec (soft limit)	Millions/sec
Ordering	FIFO queue option	FIFO topic option	No guarantee	Per-partition ordering
Content filtering	No	Subscription filters	Rich rule patterns	Consumer logic
Management	Fully managed	Fully managed	Fully managed	Managed (MSK) or self-managed
Cost model	Per request + data	Per request + data	Per event	Per broker-hour + storage

When to Use What — Decision Guide Core

✅

Use SQS When

You need to buffer workloads (jobs, tasks)
Consumer needs to process at its own pace
You need retry + DLQ for failed messages
Single consumer (or competing consumer pool)
Messages can be deleted after processing

✅

Use SNS When

One event needs to reach multiple subscribers
You want push delivery (immediate)
Subscribers are Lambda, HTTP, Email, SMS
Simple pub-sub pattern
Combined with SQS for durable fan-out

✅

Use EventBridge When

You need content-based routing rules
You're integrating with SaaS (Zendesk, Datadog)
You want schema registry + discovery
You're building event-driven architecture
You need to archive and replay events

✅

Use Kafka (MSK) When

You need millions of events per second
Multiple consumers need to read same stream
You need message replay / reprocessing
You're doing real-time analytics / ML
You already have Kafka expertise

Common Combinations Core

Pattern	Services Used	Why
Durable fan-out	SNS + SQS	SNS broadcasts, SQS buffers per-consumer
Event-driven microservices	EventBridge + SQS	EventBridge routes, SQS buffers processing
Real-time + batch	Kafka + S3 + Athena	Kafka streams, S3 stores, Athena queries
SaaS integration	EventBridge + Lambda	EventBridge receives SaaS events, Lambda processes
Transactional + analytics	SQS + Kinesis	SQS for transactions, Kinesis for analytics stream

SQS vs Kinesis — Queue vs Stream Core

Both are pull-based, but they serve fundamentally different purposes. This is a common source of confusion:

Feature	SQS	Kinesis Data Streams
Primary use	Work queue, task processing	Real-time streaming analytics
Data model	Messages (deleted after processing)	Persistent log (retention 1-365 days)
Replay capability	No — message gone after delete	Yes — replay from any offset
Multiple consumers	Competing consumer (one gets message)	Multiple consumer groups (all get all data)
Message size	256 KB	1 MB
Ordering	FIFO queue option	Per-partition ordering
Throughput scaling	Auto-scales	Partition/shard scaling (manual)
Retention	Up to 14 days	Up to 365 days
Cost model	Per request	Per shard-hour + data

🗂️

Use SQS When

You have a queue of work/tasks to process
Message can be deleted after successful processing
One consumer (or competing pool) per message
You need retry + DLQ for failures

📊

Use Kinesis When

Multiple consumers need to read the same stream
You need to replay / reprocess historical events
Real-time analytics, ML, dashboards
Audit logs requiring long retention

👉 They work together: Kinesis for ingestion + SQS for work distribution. Pattern: Kinesis → Lambda → SQS → worker pool. Kinesis handles high-throughput ingestion, SQS provides reliable per-item processing.

Quick Decision Flowchart Core

 Need to buffer work for later processing? → SQS
One event, many consumers immediately? → SNS (or SNS + SQS for durability)
Complex event routing rules? → EventBridge
SaaS integrations? → EventBridge (has native connectors)
Real-time streaming at massive scale? → Kafka (MSK) or Kinesis
Need to replay events? → Kafka (permanent log) or EventBridge Archive
 

🎓 Exam Insight

"Decouple services, buffer requests" → SQS
"Fan-out to multiple consumers" → SNS (or SNS + SQS)
"Route events based on content" → EventBridge
"Real-time analytics, millions/sec" → Kinesis Data Streams or MSK (Kafka)
"Integrate with third-party SaaS" → EventBridge (has partner sources)
These services complement each other — combinations are common and expected

👉 Key Takeaway

SQS = buffer, SNS = broadcast, EventBridge = route, Kafka = stream. They solve different problems and often work together — choose based on your specific pattern, not as competitors.

Chapter Eight

Security, Reliability & Scaling

Access Control — IAM vs Queue Policy Core

SQS access is controlled by two mechanisms: IAM policies (attached to users/roles) and SQS queue policies (attached to queues). Both must allow an action for it to succeed.

👤

IAM Policy

Attached to IAM user, role, or group
Controls what that identity can do
"Role X can send to any queue in account"
Use for: same-account access, EC2/Lambda roles

🗂️

Queue Policy

Attached to the queue itself
Controls who can access this queue
"Allow Account B to send to this queue"
Use for: cross-account access, AWS service access

Scenario	Use IAM Policy	Use Queue Policy
Lambda in same account sends to queue	Yes — attach to Lambda role	Not required
Another AWS account sends to your queue	Not sufficient alone	Yes — must allow principal
SNS topic sends to queue	Not required	Yes — allow SNS service
S3 event sends to queue	Not required	Yes — allow S3 service
Restrict which queues a role can access	Yes — specify queue ARN	Not the right tool

Encryption — At Rest and In Transit Core

🔐

Encryption at Rest (SSE)

Enable Server-Side Encryption (SSE)
AWS managed key (SSE-SQS) — free, automatic
Customer managed key (SSE-KMS) — more control
Messages encrypted when stored in SQS
Decrypted transparently when received

🔒

Encryption in Transit

SQS API uses HTTPS by default
TLS 1.2+ for all connections
No configuration required
For extra security: add IAM condition aws:SecureTransport

SQS Extended Client — Messages Larger Than 256 KB In-Depth

SQS message size limit is 256 KB. For larger payloads, use the SQS Extended Client (AWS SDK library). It automatically stores large payloads in S3 and puts only a reference in SQS.

📦

How It Works

Large payload (>256 KB) → stored in S3
SQS message contains S3 reference (s3://bucket/key)
Consumer client retrieves from S3 automatically
Messages up to 2 GB (S3 limit)

⚠️

Limitations

Not supported for FIFO queues
Additional S3 cost (storage + GET/PUT)
Java, Python, Node.js SDKs supported
Alternative: store in S3 manually, send URI in message

VPC Endpoint — Private Access In-Depth

By default, SQS API calls go over the public internet. For EC2/Lambda in private subnets with no NAT, create a VPC Interface Endpoint for SQS. Traffic stays within AWS network.

🌐

Without VPC Endpoint

Traffic goes via Internet Gateway or NAT
Private subnet workloads need NAT Gateway
NAT adds cost and is a throughput bottleneck

🔏

With VPC Endpoint

Traffic stays within AWS network
No Internet Gateway or NAT required
Lower latency, higher security
Cost: ~$0.01/hr per AZ + data fees

Message Retention & Durability Core

Setting	Default	Range	Notes
Message retention	4 days	1 minute – 14 days	Messages deleted after this period if not processed
Visibility timeout	30 seconds	0 – 12 hours	How long message is hidden during processing
Message size	—	1 byte – 256 KB	For larger payloads, store in S3 and send pointer
Delay queue	0 seconds	0 – 15 minutes	Messages invisible for this period after send
Receive wait time	0 seconds	0 – 20 seconds	Long polling wait time (set to 20 for efficiency)

Scaling Consumers — Auto Scaling on Queue Depth Core

The best way to scale SQS consumers is based on queue depth — the number of messages waiting. Use CloudWatch metric ApproximateNumberOfMessages to trigger Auto Scaling.

📊

Key CloudWatch Metrics

ApproximateNumberOfMessages — messages waiting
ApproximateNumberOfMessagesNotVisible — in-flight
ApproximateAgeOfOldestMessage — queue lag
NumberOfMessagesReceived — throughput
NumberOfMessagesSent — producer rate

⚡

Auto Scaling Strategy

Target tracking: "keep backlog per instance at 1000"
Scale out when: ApproximateNumberOfMessages / DesiredCapacity > 1000
Scale in when: backlog cleared
Use ApproximateAgeOfOldestMessage for SLA alarms

👉 Best practice formula: Target = (Acceptable latency in seconds) × (Messages processed per second per instance). If each instance processes 10 msg/sec and you want max 60s latency, target = 600 messages per instance.

Lambda as Consumer — Event Source Mapping Core

Lambda can poll SQS automatically via Event Source Mapping. No need for EC2 workers. Lambda scales automatically based on queue depth.

✅

Lambda + SQS Benefits

No infrastructure to manage
Auto-scales with queue depth
Pay only for invocations
Built-in retry + DLQ support
Processes up to 10 messages per batch

⚠️

Lambda + SQS Limits

Max 15 min execution time (per message)
1000 concurrent executions default (can increase)
FIFO queue: max 10 concurrent batches per group
Cold starts add latency on scale-out
Not ideal for very long-running jobs

Monitoring & Alarms Core

Alarm	Metric	Threshold Example	Why
Queue backlog	`ApproximateNumberOfMessages`	> 10,000 for 5 min	Consumers falling behind
Processing lag	`ApproximateAgeOfOldestMessage`	> 300 sec	SLA violation risk
DLQ messages	`ApproximateNumberOfMessages` (DLQ)	> 0	Messages failing repeatedly
Empty receives	`NumberOfEmptyReceives`	> 1000/min	Enable long polling

Cost Calculation Examples In-Depth

SQS pricing is simple: $0.40 per 1M requests for Standard, $0.50 per 1M for FIFO. Batching and long polling are the main optimization levers.

Workload	Messages/Day	Without Batching	With Batching
Small e-commerce	1,000 orders	$0.0008/day	$0.00008/day
Medium app	10,000 msg	$0.008/day	$0.0008/day
Large scale	10M msg	$8/day ($240/mo)	$0.80/day ($24/mo)
FIFO	100K msg	$0.10/day	$0.01/day

💰

Cost Optimization Checklist

✅ Use batch APIs — 10x cost reduction
✅ Enable long polling (WaitTimeSeconds=20)
✅ Same region for sender and consumer
✅ Delete unused queues
✅ Monitor with Cost Explorer (filter: "Requests")

💡

Data Transfer Notes

SQS → Lambda (same region): free
SQS → EC2 (same region): free
Cross-region: ~$0.02/GB (avoid if possible)
SQS → Internet: ~$0.09/GB

Security Best Practices Core

🔐

Enable SSE

Always enable encryption at rest. Use SSE-SQS (free) or SSE-KMS (for compliance). No excuse for unencrypted queues.

🔒

Least Privilege

Grant only needed actions: sqs:SendMessage for producers, sqs:ReceiveMessage + sqs:DeleteMessage for consumers.

📝

Enable Logging

Use CloudTrail to log all SQS API calls. Monitor for unexpected access patterns or unauthorized attempts.

🎓 Exam Insight

"Cross-account queue access" → Requires queue policy (not just IAM)
"SNS publishes to SQS" → Queue policy must allow sns.amazonaws.com
"Encrypt messages at rest" → Enable SSE-SQS or SSE-KMS
"Private subnet access to SQS" → VPC Interface Endpoint
"Scale consumers on queue size" → Auto Scaling on ApproximateNumberOfMessages
"Reduce SQS costs" → Enable long polling (WaitTimeSeconds=20)

👉 Key Takeaway

SQS is designed for production: enable SSE for encryption, use queue policies for cross-account/service access, scale consumers on queue depth, and always configure a DLQ. Monitor ApproximateAgeOfOldestMessage for SLA compliance.

Amazon SQS · Complete
SQS is a managed message queue — producers enqueue, consumers poll and process at their own pace
Core model: Send → Store (up to 14 days) → Poll → Process → Delete
Visibility timeout — message hidden during processing; returns to queue if not deleted
Standard queue — near-unlimited throughput, best-effort order, at-least-once delivery
FIFO queue — strict ordering, exactly-once, 3,000 msg/sec limit
Dead-Letter Queue — catches messages that fail repeatedly; essential in production
SNS + SQS fan-out — SNS broadcasts, each consumer has its own SQS buffer
Security — IAM + queue policies, SSE encryption, VPC endpoints for private access
Scaling — Auto Scale on ApproximateNumberOfMessages; Lambda event source mapping for serverless
vs other services: SQS = buffer, SNS = broadcast, EventBridge = route, Kafka = stream