Amazon S3
LearningTree Β· AWS Β· Storage

Amazon S3 β€”
Simple Storage Service

Unlimited object storage in the cloud. The backbone of data lakes, backups, static websites, and CDN origins β€” infinitely scalable, 11 nines durable.

⚑ S3 in 30 Seconds

  • Object storage β€” store any file of any size, retrieved by a unique key (URL)
  • Unlimited capacity β€” no pre-provisioning, no disk management
  • 99.999999999% (11 nines) durability β€” data replicated across β‰₯3 AZs automatically
  • Multiple storage classes β€” optimize cost from millisecond access to archival
  • Integrated with almost every AWS service β€” the default data layer for AWS
01
Chapter One

What is S3

Introduction Introductory

Amazon S3 (Simple Storage Service) is AWS's object storage service. It lets you store and retrieve any amount of data β€” files, images, videos, backups, logs, ML datasets β€” from anywhere on the internet. Unlike a hard drive with folders and files, S3 stores data as objects inside buckets, each identified by a unique key.

πŸ‘‰ Think of S3 as: An infinite hard drive in the cloud β€” pay only for what you store, access from anywhere

S3 was one of the first AWS services, launched in 2006. Today it stores trillions of objects and handles millions of requests per second across AWS customers. It is the most-used AWS storage service and the foundation of most data architectures on AWS.

Why S3 Exists Introductory
⚠️

Traditional File Storage Problems

  • Fixed disk capacity β€” buy hardware before you need it
  • Disks fail β€” complex RAID and backup setups required
  • Not globally accessible β€” VPN or network share required
  • Scaling is slow β€” days/weeks to add capacity
  • High upfront capital cost
βœ…

S3 Solves

  • Unlimited capacity β€” grows automatically with your data
  • AWS manages replication and durability β€” 11 nines
  • Accessible over HTTPS from anywhere, any device
  • Add storage instantly β€” no provisioning required
  • Pay per GB stored + requests β€” no upfront cost
Object Storage vs Block vs File Core

Understanding the storage type is critical for choosing the right service:

TypeHow It WorksAWS ServiceBest For
Object StorageFlat namespace β€” key β†’ object. No folders. Access via HTTP.S3Files, images, backups, data lakes, logs
Block StorageRaw disk blocks. OS mounts it like a hard drive. Low latency.EBSDatabases, boot volumes, OS-level read/write
File StorageShared filesystem with directories. NFS protocol.EFSShared access across multiple EC2 instances

πŸ‘‰ S3 is not a filesystem. You cannot "mount" S3 like a drive or run a database on it. It is optimized for storing and retrieving whole objects via HTTP β€” not for random read/write of small byte ranges.

Where S3 Fits in AWS Introductory

S3 is referenced by almost every AWS service:

πŸ’Ύ

Data & Analytics

Data lakes (Athena, Glue, Redshift Spectrum). S3 is the raw storage layer β€” query data in-place without loading into a database.

🌐

Web & Applications

Static website hosting (HTML/CSS/JS), user uploads, media assets, and application configuration files stored in S3.

πŸ”§

DevOps & Infrastructure

CloudFormation templates, Lambda deployment packages, CodePipeline artifacts, EC2 AMI snapshots β€” all stored in S3.

πŸ›‘οΈ

Backup & Compliance

AWS Backup destinations, CloudTrail audit logs, VPC flow logs, config history, and compliance archives all land in S3.

πŸ€–

Machine Learning

SageMaker training datasets, model artifacts, and inference results. S3 is the default ML data store on AWS.

πŸ“‘

CDN Origin

CloudFront uses S3 as an origin to cache and serve content globally with low latency β€” the standard pattern for static assets.

Mental Model Core

Think of S3 like a post office with infinite numbered mailboxes:

πŸ“«

The Post Office = Bucket

  • A named container for objects
  • Name must be globally unique across all AWS accounts
  • Lives in one AWS region β€” data does not leave unless you replicate
  • You own and control the bucket policies and access
  • Up to 100 buckets per account (soft limit, can be raised)
πŸ“¦

The Package = Object

  • Any file β€” image, video, CSV, zip, binary, JSON
  • Up to 5 TB per object (use Multipart Upload above 100 MB)
  • Identified by a unique key (like a full file path)
  • Includes metadata: content-type, custom tags, system attributes
  • Immutable β€” to update, you replace the entire object
Durability vs Availability Core

Two different guarantees β€” both important, often confused on the exam:

πŸ”’

Durability β€” 99.999999999%

  • Will your data survive? β€” yes, 11 nines
  • AWS stores multiple copies across β‰₯3 AZs automatically
  • Designed to tolerate concurrent loss of data in 2 facilities
  • Losing stored data in S3 Standard is essentially impossible
  • Same for all storage classes except S3 One Zone-IA (single AZ)
⚑

Availability β€” 99.99%

  • Can you access it right now? β€” 99.99% of the time
  • ~52 minutes downtime per year on S3 Standard
  • Varies by storage class β€” S3-IA = 99.9%, One Zone-IA = 99.5%
  • Glacier availability is lower β€” retrieval takes minutes to hours
Concept Diagram Introductory
S3 β€” User uploads and retrieves objects from a bucket
πŸ‘€ USER / APP PUT GET AWS CLOUD β€” REGION S3 BUCKET (my-app-bucket) πŸ–ΌοΈ images/logo.png key = object path πŸ“„ data/report.csv 4.2 MB πŸŽ₯ videos/intro.mp4 1.2 GB Automatically replicated across β‰₯3 Availability Zones 11 nines durability Β· 99.99% availability (Standard)
Core Use Cases Introductory
Use CaseHow S3 Is UsedWhy It Works
Static Website HostingServe HTML/CSS/JS from a bucket with public accessNo server needed β€” scales to any traffic automatically
Database BackupsDump files pushed to S3 on a scheduleCheap, durable, cross-region replication available
User UploadsPresigned URLs let users upload directly to S3Bypass your app server for large files
Data LakeRaw data (JSON, Parquet, CSV) stored in S3, queried with AthenaDecouple storage from compute β€” pay per query
Log ArchiveCloudTrail, ALB access logs, VPC flow logs β†’ S3Long-term storage, lifecycle to Glacier after 90 days
CDN OriginCloudFront serves from S3 origin globallyEdge caching + S3 durability = best of both worlds
Strong Read-After-Write Consistency In-Depth

Since December 2020, Amazon S3 provides strong read-after-write consistency for all operations β€” at no additional cost and with no performance impact. This was a major change from S3's original eventual consistency model.

βœ…

Current Behavior (Strong Consistency)

  • PUT a new object β†’ immediately readable by all subsequent GETs
  • Overwrite an existing object β†’ next GET returns the new version
  • DELETE an object β†’ next GET returns 404
  • LIST operations reflect the latest state
  • Applies to all storage classes, all regions
⚠️

Old Behavior (Pre-2020 β€” No Longer Applies)

  • New objects: read-after-write consistent (same as now)
  • Overwrites and deletes: eventually consistent β€” you might read stale data
  • LIST after PUT: object might not appear immediately
  • This is in many older study guides β€” it is outdated

πŸ‘‰ Exam note: S3 is now strongly consistent for all operations. If a question references eventual consistency for S3, the correct answer is strong read-after-write consistency. Older materials mentioning eventual consistency for overwrites are outdated.

πŸ‘‰ Key Takeaway

S3 is unlimited, durable object storage β€” the default data layer for AWS. If you need to store a file in AWS, S3 is the answer 90% of the time.

πŸ“‹ Chapter 1 β€” Summary
  • Object storage β€” files stored as objects with a unique key, retrieved via HTTP. Not a filesystem or database.
  • Unlimited capacity β€” no pre-provisioning. Pay per GB stored + per request made.
  • 11 nines durability β€” data replicated across β‰₯3 AZs automatically. AWS manages it.
  • Object vs Block vs File: S3 = objects (HTTP). EBS = block (disk). EFS = file (NFS mount).
  • Strong consistency: all operations (PUT, DELETE, LIST) are strongly consistent since 2020. No eventual consistency.
  • Used everywhere: backups, data lakes, static websites, ML datasets, CDN origin, DevOps artifacts.
  • Durability β‰  Availability: 11 nines = data won't disappear. 99.99% = you can access it almost always.
02
Chapter Two

Core Concepts & Storage Model

Buckets Core

A bucket is the top-level container for objects in S3. Every object lives inside a bucket. Buckets are created in a specific AWS region and data does not leave that region unless you explicitly configure replication.

🌍

Globally Unique Name

Bucket names must be unique across all AWS accounts globally β€” not just your account. If my-company-data is taken by anyone in the world, you cannot use it.

πŸ“

Regional Resource

A bucket is created in one region (e.g., us-east-1). Choose the region closest to your users or compute workload to minimize latency and data transfer costs.

πŸ“‹

Naming Rules

  • 3–63 characters long
  • Lowercase letters, numbers, hyphens only
  • Cannot start or end with a hyphen
  • Cannot be formatted as an IP address
Objects & Keys Core

An object is the fundamental unit of data in S3. It consists of the data itself plus metadata. Every object is identified by a key β€” a string that uniquely identifies the object within its bucket.

ComponentWhat It IsExample
KeyThe full "path" of the object within the bucketimages/2026/logo.png
ValueThe actual data β€” any bytes, any formatBinary PNG file data
Version IDUnique ID per version (when versioning is enabled)ab3c4de5fg6h
MetadataKey-value pairs describing the objectContent-Type: image/png
TagsUser-defined labels for cost allocation or access controlenv=prod, team=frontend
ETagMD5 hash of the object β€” used to verify integrityd41d8cd98f00b204e9800998ecf8427e

πŸ‘‰ S3 has no real folders β€” the key images/2026/logo.png is just a string. The AWS console displays the slash as a folder, but it is purely cosmetic. This matters for prefix-based performance optimization.

Object Size Limits Core
πŸ“

Single PUT Upload

Max 5 GB per PUT request. For anything larger, use Multipart Upload. AWS recommends Multipart for objects above 100 MB.

πŸ”€

Multipart Upload

Split large files into parts (min 5 MB, max 10,000 parts). Upload parts in parallel. Combine on S3. Required for objects above 5 GB.

πŸ“¦

Maximum Object Size

A single object can be up to 5 TB. No limit on bucket total size β€” store petabytes in one bucket if needed.

Storage Classes In-Depth

S3 offers multiple storage classes, each optimized for different access frequency and cost profiles. You pay less per GB for classes you access less frequently β€” but you pay a retrieval fee when you do access them.

Storage ClassAccess PatternAvailabilityRetrieval FeeBest For
S3 StandardFrequent access99.99%NoneActive data, websites, apps
S3 Intelligent-TieringUnknown / changing99.9%NoneData with unpredictable patterns
S3 Standard-IAInfrequent (monthly)99.9%Per GB retrievedBackups, disaster recovery
S3 One Zone-IAInfrequent, single AZ99.5%Per GB retrievedRe-creatable data, secondary backups
S3 Glacier InstantRare (quarterly)99.9%Per GB retrievedArchive with instant access
S3 Glacier FlexibleRare β€” minutes to hours99.99%Per GB + requestCompliance archives, tape replacement
S3 Glacier Deep ArchiveVery rare β€” 12h retrieval99.99%Per GB + request7–10 year regulatory archives
Storage Class Cost vs Access Frequency β€” The Trade-off
STORAGE COST ACCESS FREQUENCY β†’ Standard Int-Tier Standard-IA One Zone-IA Glacier Inst. Deep Archive Lower storage cost = higher retrieval fee. Choose based on how often you access the data.
Versioning In-Depth

Versioning keeps multiple versions of an object in the same bucket. Every time you overwrite or delete an object, S3 creates a new version instead of destroying the old one.

βœ…

Why Enable Versioning

  • Recover from accidental overwrites and deletes
  • Required prerequisite for S3 Replication
  • Required for S3 Object Lock (compliance)
  • Enables audit trail β€” who changed what, when
  • Deletes create a "delete marker" β€” data is still there
⚠️

Versioning Trade-offs

  • Storage cost grows β€” every version is billed separately
  • Once enabled, cannot be fully disabled β€” only suspended
  • Need lifecycle rules to expire old versions automatically
  • Deleting a versioned object requires deleting ALL versions
Versioning β€” How Overwrites and Deletes Work
ACTION: PUT v1 PUT v2 (overwrite) DELETE report.csv version: v1 report.csv version: v2 ← current version: v1 (kept) DELETE MARKER v2 + v1 still exist v1, v2 recoverable Delete adds a marker β€” it does not erase data. Restore by deleting the marker or retrieving a specific version.
Metadata & Tags Core
πŸ“‹

System Metadata

  • Set by AWS β€” Content-Type, Content-Length, Last-Modified
  • Content-Type is critical β€” browsers use it to render objects correctly
  • Set at upload time, cannot always be changed retroactively
🏷️

User-Defined Tags

  • Up to 10 key-value pairs per object
  • Used for cost allocation reports (group by team, env, project)
  • Used in lifecycle rules β€” apply rules to tagged objects only
  • Used in IAM/bucket policies β€” grant access based on tags
S3 Request Types Core

Understanding request types matters for cost calculation β€” you pay per request:

Request TypeOperationRelative Cost
PUT / COPY / POST / LISTWrite or list operationsHigher ($0.005 per 1,000)
GET / SELECTRead object dataLower ($0.0004 per 1,000)
DELETEDelete objectFree
Lifecycle transitionsMove object between storage classesPer-transition fee
πŸ‘‰ Key Takeaway

S3's storage model is simple: buckets hold objects, objects have keys and metadata. The storage class you choose determines cost and access speed β€” match it to how frequently you access the data.

πŸ“‹ Chapter 2 β€” Summary
  • Buckets β€” globally unique named containers, tied to one region. Up to 100 per account (soft limit).
  • Objects β€” data + metadata + tags. Max 5 TB. Use Multipart Upload above 100 MB.
  • Keys β€” the full "path" string identifying an object. No real folders β€” slashes are cosmetic.
  • Storage classes β€” Standard (frequent) β†’ IA (monthly) β†’ Glacier (rare) β†’ Deep Archive (years). Lower cost = retrieval fee.
  • Versioning β€” keeps all versions on overwrite/delete. Enables recovery. Required for replication and Object Lock.
  • Metadata & Tags β€” Content-Type is critical. Tags drive cost allocation, lifecycle rules, and access control.
03
Chapter Three

Security & Access Control

S3 Security Model Core

By default, all S3 buckets and objects are private. Nothing is publicly accessible unless you explicitly allow it. Access to S3 is controlled through multiple overlapping layers β€” understanding which layer applies when is the key to both security and the SAA-C03 exam.

πŸ‘€

IAM Policies

Attached to users, groups, or roles. Define what AWS identities can do to S3. Evaluated by IAM before the request even reaches S3.

πŸͺ£

Bucket Policies

Attached to the bucket itself. Resource-based policy in JSON. Can grant access to other AWS accounts, services, and the public. Most powerful S3 access tool.

πŸ”‘

ACLs (Legacy)

Object or bucket-level access control lists. Predates IAM. AWS recommends disabling ACLs and using bucket policies instead. Still appears on exams.

IAM Policies for S3 Core

IAM policies grant S3 permissions to AWS identities. The identity must have permissions AND the bucket policy must allow (or at least not deny) the request.

IAM ActionWhat It Allows
s3:GetObjectDownload / read an object
s3:PutObjectUpload / write an object
s3:DeleteObjectDelete an object
s3:ListBucketList objects in a bucket
s3:GetBucketPolicyRead the bucket policy
s3:PutBucketPolicyWrite / replace the bucket policy
s3:*Full access to all S3 actions (admin)
Bucket Policies In-Depth

Bucket policies are JSON documents attached directly to a bucket. They can grant or deny access to specific AWS accounts, IAM users/roles, services, or the public. They are the primary mechanism for cross-account access and public access.

βœ…

Common Bucket Policy Use Cases

  • Grant another AWS account read access to a bucket
  • Force all uploads to use HTTPS (deny HTTP)
  • Allow CloudFront OAC to read from a private bucket
  • Restrict access to specific IP address ranges
  • Require server-side encryption on all PUT requests
  • Make a bucket publicly readable for static website hosting
πŸ“‹

Policy Structure

  • Effect β€” Allow or Deny
  • Principal β€” who (IAM user, account, * for public)
  • Action β€” what S3 operations (s3:GetObject)
  • Resource β€” which bucket/object (arn:aws:s3:::my-bucket/*)
  • Condition β€” optional constraints (IP, MFA, HTTPS)
S3 Access Decision β€” IAM + Bucket Policy Evaluation
REQUEST IAM identity EXPLICIT DENY? IAM or bucket policy YES DENIED βœ— NO IAM POLICY ALLOW? or Bucket policy allow NO DENIED βœ— YES BLOCK PUBLIC ACCESS ON? if public request YES DENIED βœ— NO ALLOW βœ“
Block Public Access Core

Block Public Access is a safety switch that sits above bucket policies and ACLs. Even if your bucket policy grants public access, Block Public Access will override and deny it.

πŸ›‘οΈ

What It Does

  • 4 independent settings that can be toggled on/off
  • Can be set at account level (all buckets) or per bucket
  • Account-level setting overrides bucket-level
  • Enabled by default on all new buckets since 2023
  • Protects against misconfigured bucket policies accidentally exposing data
⚠️

When to Disable

  • Static website hosting that needs to be publicly readable
  • Public software distribution buckets
  • Any intentional public access scenario
  • Must be explicitly and deliberately turned off β€” never by accident
S3 Access Points In-Depth

S3 Access Points simplify managing access to shared datasets in S3. Instead of one complex bucket policy that handles every application, each application gets its own named endpoint with its own access policy β€” scoped to exactly what it needs.

πŸ”Œ

How It Works

  • Each access point has a unique DNS name (endpoint)
  • Each has its own IAM-style policy for permissions
  • Multiple access points on one bucket β€” one per app/team
  • Access point ARN used in place of bucket ARN
πŸ”’

VPC-Restricted Access Points

  • Access point can be restricted to a specific VPC
  • Requests from outside the VPC are automatically denied
  • No need for complex bucket policy VPC conditions
  • Combines with VPC Endpoints for fully private access
πŸ—οΈ

When to Use

  • Data lake: different teams query different prefixes
  • Multi-tenant: each tenant's app gets scoped access
  • Compliance: audit access per application
  • At scale: 10,000 access points per bucket supported
VPC Endpoint for S3 (Gateway) Core

A VPC Gateway Endpoint allows EC2 instances and other resources in a private subnet to access S3 without going through the internet β€” no NAT Gateway, no Internet Gateway, no public IP required.

πŸ”—

How It Works

  • Create a Gateway Endpoint for S3 in your VPC
  • Attach route table entries directing S3 traffic to the endpoint
  • Traffic to S3 stays on the AWS private network β€” never touches the internet
  • Free β€” no hourly charge, no data processing charge
  • Works with bucket policies: add aws:sourceVpce condition to restrict access to endpoint only
βœ…

Benefits

  • Security: data never traverses the public internet
  • Cost: no NAT Gateway data processing fees (saves $0.045/GB)
  • Performance: lower latency, higher throughput within AWS
  • Exam: "How to access S3 from a private subnet securely" β†’ Gateway Endpoint

πŸ‘‰ Exam tip: S3 and DynamoDB use Gateway Endpoints (free, route table-based). Most other AWS services use Interface Endpoints (ENI-based, hourly charge). "Private S3 access from a private subnet" β†’ VPC Gateway Endpoint β€” this appears on nearly every AWS exam.

Presigned URLs In-Depth

A presigned URL grants temporary access to a private S3 object without making the bucket public. Any identity with the right IAM permissions can generate one.

πŸ”—

How It Works

S3 embeds the credentials and expiry time into the URL itself. The URL is signed with the creator's AWS credentials. Anyone with the URL can access the object until it expires.

βœ…

Use Cases

  • User downloads a private file from your app
  • User uploads directly to S3 without credentials
  • Sharing a large file temporarily
  • Email attachment links that expire
⏱️

Expiry

  • Default: 1 hour
  • Max: 7 days (with STS temp credentials)
  • URL becomes invalid after expiry β€” no revocation needed
  • Revoke early by invalidating the signing credentials
Encryption In-Depth

S3 supports encryption at rest and in transit. Since January 2023, all new objects are encrypted by default with SSE-S3.

TypeKey ManagementUse CaseExam Note
SSE-S3AWS manages keys completelyDefault β€” zero management overheadHeader: x-amz-server-side-encryption: AES256
SSE-KMSAWS KMS β€” you control the key policyCompliance, audit trail, cross-account controlCloudTrail logs every key usage. Adds KMS API call cost.
SSE-CYou provide the key on every requestYou manage keys outside AWS completelyAWS never stores your key β€” must be sent with every PUT/GET.
Client-sideYou encrypt before uploadZero trust β€” AWS never sees plaintextApplication owns the full encryption lifecycle.
πŸš€

Encryption in Transit

  • All S3 endpoints support HTTPS (TLS 1.2+)
  • HTTP requests are also accepted by default β€” unless you deny them
  • Force HTTPS with a bucket policy condition: aws:SecureTransport: false β†’ Deny
  • HTTPS is always recommended β€” required for compliance workloads
πŸ”‘

SSE-KMS Considerations

  • Every S3 GET/PUT = a KMS API call (GenerateDataKey / Decrypt)
  • KMS has request rate limits β€” heavy S3 workloads can hit KMS throttling
  • Use KMS key policies to restrict who can use the key
  • Audit all data access via CloudTrail β€” every decrypt is logged
ACLs (Legacy) Core

Access Control Lists (ACLs) are the original S3 access mechanism. AWS now recommends disabling ACLs and using bucket policies instead. However, ACLs still appear on certifications.

ACL PermissionWhat It Allows
READList objects (bucket) or download object
WRITEUpload/delete objects in bucket
READ_ACPRead the ACL itself
WRITE_ACPModify the ACL
FULL_CONTROLAll of the above

πŸ‘‰ New accounts have ACLs disabled by default. Use bucket policies for access control β€” they are more expressive, easier to audit, and don't require understanding legacy ACL semantics.

Security Best Practices Core
πŸ”’

Bucket Hardening

  • Enable Block Public Access at account level
  • Enable versioning β€” protects against ransomware and accidental deletes
  • Enable Object Lock for compliance data (WORM)
  • Require SSE-KMS for sensitive data via bucket policy
  • Enable S3 access logging β€” record all requests to the bucket
πŸ›‘οΈ

IAM & Network

  • Use IAM roles β€” never hardcode credentials in apps
  • Apply least-privilege policies β€” grant only the actions needed
  • Use VPC Endpoints (Gateway type) for private access from EC2
  • Force HTTPS with aws:SecureTransport deny condition
  • Enable AWS Macie for sensitive data discovery (PII detection)
πŸ‘‰ Key Takeaway

S3 security has three layers: IAM (who can act), Bucket Policy (what the bucket allows), and Block Public Access (safety override). All three must align for access to succeed. When in doubt, Block Public Access wins.

πŸ“‹ Chapter 3 β€” Summary
  • Default private β€” all buckets and objects are private. Nothing is public unless you explicitly allow it.
  • IAM Policies β€” attached to identities. Control what users/roles can do in S3.
  • Bucket Policies β€” attached to the bucket. JSON resource policy. Best for cross-account and public access.
  • Block Public Access β€” account or bucket-level override. Enabled by default. Always overrides bucket policy.
  • Access Points β€” per-application named endpoints with individual policies. VPC-restricted for private access. Scale to 10,000 per bucket.
  • VPC Gateway Endpoint β€” private S3 access from VPC without internet. Free. Route table-based. Common exam topic.
  • Presigned URLs β€” temporary signed URLs for private object access. Max 7 days. No bucket policy change needed.
  • Encryption: SSE-S3 (default, AWS manages), SSE-KMS (audit trail, your key policy), SSE-C (you manage key).
  • Force HTTPS via bucket policy. Disable legacy ACLs. Use VPC endpoints for private access.
04
Chapter Four

Data Management & Lifecycle

Lifecycle Rules In-Depth

Lifecycle rules automate the movement and deletion of objects over time. They eliminate the need to manually manage aging data β€” define the rules once, and S3 handles the transitions and expirations automatically.

πŸ”„

Transition Actions

  • Move objects to a cheaper storage class after N days
  • Example: Standard β†’ Standard-IA after 30 days
  • Example: Standard-IA β†’ Glacier after 90 days
  • Example: Glacier β†’ Deep Archive after 365 days
  • Can be scoped to a prefix or object tags
πŸ—‘οΈ

Expiration Actions

  • Delete objects after N days β€” automatic cleanup
  • Delete expired delete markers (versioned buckets)
  • Delete non-current versions after N days
  • Abort incomplete multipart uploads after N days
  • Prevents unbounded storage cost growth
Lifecycle Transitions β€” Object Aging Through Storage Classes
S3 STANDARD Day 0–29 Frequent access 30d STANDARD-IA Day 30–89 Monthly access 90d GLACIER FLEX Day 90–364 Rare access 365d DEEP ARCHIVE Day 365–N Annual / compliance 7yr DELETE Expire Minimum storage duration: Standard-IA = 30 days Β· Glacier Flexible = 90 days Β· Deep Archive = 180 days
Replication In-Depth

S3 Replication automatically and asynchronously copies objects from one bucket to another. Versioning must be enabled on both source and destination buckets.

🌍

CRR β€” Cross-Region Replication

  • Source and destination in different AWS regions
  • Use case: disaster recovery across regions
  • Use case: low-latency access from another geography
  • Use case: compliance (data residency requirements)
  • Incurs inter-region data transfer cost
πŸ“

SRR β€” Same-Region Replication

  • Source and destination in the same AWS region
  • Use case: copy data between accounts in the same region
  • Use case: log aggregation from multiple source buckets
  • Use case: test environment with live data copy
  • No inter-region transfer cost
Replication BehaviourDetail
What replicatesNew objects after replication is enabled. Existing objects need S3 Batch Replication.
Delete behaviourDelete markers are NOT replicated by default (can be enabled). Permanent deletes never replicate.
Storage classDestination uses same class by default. Can override to a cheaper class.
OwnershipReplicated objects are owned by source account by default. Use Object Ownership setting to change.
ChainingReplication is not transitive — A→B→C does NOT automatically replicate A to C.
EncryptionSSE-S3 and SSE-KMS objects can be replicated. SSE-C objects cannot.
CRR vs SRR β€” Replication Topology
CROSS-REGION REPLICATION (CRR) Source Bucket us-east-1 Versioning: ON async Dest Bucket eu-west-1 Versioning: ON DR Β· geo-compliance Β· latency SAME-REGION REPLICATION (SRR) Source Bucket us-east-1 Account A async Dest Bucket us-east-1 Account B log aggregation Β· cross-account copy Β· test data Both require versioning enabled Β· Replication is asynchronous Β· Only NEW objects replicate by default Use S3 Batch Replication to replicate existing objects

⏱️ Replication Time Control (RTC)

Standard replication is asynchronous with no SLA on timing β€” most objects replicate in seconds, but some may take hours. S3 Replication Time Control (RTC) guarantees that 99.99% of objects replicate within 15 minutes, with S3 metrics to track replication lag. Use RTC when you have compliance or disaster recovery requirements that demand a guaranteed replication SLA. RTC adds cost β€” enable it only for buckets where the timing guarantee matters.

Object Lock In-Depth

Object Lock prevents objects from being deleted or overwritten for a defined period. It implements WORM (Write Once Read Many) storage β€” required for SEC 17a-4, HIPAA, and financial compliance workloads.

πŸ”’

Retention Modes

  • Compliance mode β€” nobody can delete or change the object, including the root user. Period cannot be shortened. Used for strict regulatory requirements.
  • Governance mode β€” only users with s3:BypassGovernanceRetention permission can override. Lighter enforcement for internal policies.
🏦

Legal Hold

  • Prevents deletion independent of any retention period
  • No expiry date β€” stays locked until explicitly removed
  • Requires s3:PutObjectLegalHold permission to apply/remove
  • Used during litigation β€” preserve evidence without a known end date

πŸ‘‰ Object Lock must be enabled when the bucket is created β€” it cannot be added to an existing bucket. Compliance mode retention periods cannot be shortened even by AWS Support.

S3 Event Notifications In-Depth

S3 can publish events when objects are created, deleted, restored, or replicated. This enables event-driven architectures where downstream systems react to data changes automatically.

πŸ“¬

SNS

Fan out notification to multiple subscribers. Email alerts, SMS, or trigger multiple SQS queues from one S3 event.

πŸ“©

SQS

Decouple processing from uploads. Workers poll SQS and process each uploaded object independently. Handles volume spikes gracefully.

⚑

Lambda

Trigger serverless processing immediately on upload. Image resizing, virus scanning, data validation, format conversion β€” all without a server.

Event TypeTriggered WhenCommon Use
s3:ObjectCreated:*Any object is uploaded (PUT, POST, COPY, multipart)Trigger processing pipeline on upload
s3:ObjectRemoved:*Object is deletedAudit deletion, update downstream index
s3:ObjectRestore:*Glacier object restore initiated/completedNotify when archive is available
s3:Replication:*Replication failure or missed thresholdAlert on replication health issues
S3 Batch Operations In-Depth

S3 Batch Operations runs large-scale jobs across billions of objects with a single API call. Instead of writing scripts to iterate through objects, you describe the operation and S3 runs it at scale.

βš™οΈ

Supported Operations

  • Copy objects between buckets
  • Replace object tags or ACLs
  • Restore objects from Glacier
  • Invoke Lambda on every object
  • Replicate existing objects (Batch Replication)
  • Set Object Lock retention on existing objects
πŸ“Š

How It Works

  • Provide an object manifest (S3 Inventory report or CSV)
  • Define the operation and parameters
  • S3 processes all listed objects β€” tracks progress and errors
  • Generates a completion report to S3
  • Full audit trail in CloudTrail
πŸ‘‰ Key Takeaway

Lifecycle rules + Replication + Object Lock form your data governance foundation. Automate transitions to save cost, replicate for resilience, and lock for compliance. Event notifications turn S3 into a trigger for your entire data pipeline.

πŸ“‹ Chapter 4 β€” Summary
  • Lifecycle rules β€” automate transitions (Standard β†’ IA β†’ Glacier) and expirations. Scope by prefix or tag.
  • CRR β€” cross-region replication for DR, compliance, and latency. Adds inter-region transfer cost.
  • SRR β€” same-region replication for cross-account copy, log aggregation. No transfer cost.
  • Replication nuances: versioning required, new objects only, delete markers not replicated by default, not transitive.
  • Object Lock β€” WORM storage. Compliance mode = nobody can delete. Governance mode = privileged users can override. Must enable at bucket creation.
  • Event notifications β€” S3 β†’ SNS / SQS / Lambda on create/delete/restore. Foundation of event-driven data pipelines.
  • Batch Operations β€” run jobs on billions of objects. Copy, tag, restore, invoke Lambda at scale.
05
Chapter Five

Performance & Scaling

S3 Scalability Model Core

S3 scales automatically β€” there are no capacity limits to configure, no partitions to manage, and no pre-warming required. AWS manages the infrastructure horizontally behind the scenes. However, understanding S3's performance characteristics helps you avoid hitting rate limits on high-throughput workloads.

πŸ“€

PUT / COPY / DELETE

3,500 requests/sec per prefix. Writing 100K objects/sec requires ~29 prefixes with evenly distributed keys.

πŸ“₯

GET / HEAD

5,500 requests/sec per prefix. A single prefix can serve ~5,500 reads per second before S3 automatically scales further.

♾️

No Hard Limits

These are baseline per-prefix rates. S3 will scale beyond these automatically as traffic increases β€” no pre-warming needed.

Prefix Partitioning In-Depth

A prefix is the part of an object key before the final filename β€” essentially the "path". S3 uses prefixes to distribute requests across its internal infrastructure. More distinct prefixes = more parallelism = higher throughput.

❌

Bad Pattern β€” Single Prefix

  • All objects under uploads/2026/
  • All requests go to the same partition
  • Hits 3,500 PUT/sec limit quickly
  • No horizontal scaling benefit
βœ…

Good Pattern β€” Multiple Prefixes

  • Distribute across a/uploads/ b/uploads/ c/uploads/
  • Or use hash prefixes: a3f/ 7b2/ 9d1/
  • Each prefix gets its own 3,500/5,500 rate budget
  • 10 prefixes = 35,000 PUT/sec, 55,000 GET/sec
Prefix Partitioning β€” Spreading Load for High Throughput
APP High write rate a3f/uploads/img.jpg 3,500 PUT/s budget 7b2/uploads/img.jpg 3,500 PUT/s budget 9d1/uploads/img.jpg 3,500 PUT/s budget 3 prefixes Γ— 3,500 PUT/sec = 10,500 PUT/s Add more prefixes to scale further
Multipart Upload Core

Multipart Upload splits large objects into parts, uploads them in parallel, and reassembles them on S3. It is the correct mechanism for any object above 100 MB.

FeatureDetail
Minimum part size5 MB (except the last part)
Maximum parts10,000 parts per object
Maximum object size5 TB (requires multipart)
Parallel uploadsUpload all parts simultaneously β€” dramatically faster on high-bandwidth connections
Resume on failureOnly the failed part needs to be retried β€” not the entire object
Incomplete uploadsParts are billed even if never completed β€” use lifecycle rule to abort after N days
S3 Transfer Acceleration In-Depth

Transfer Acceleration routes uploads through AWS CloudFront edge locations instead of going directly to the S3 regional endpoint. Data enters the AWS backbone at the nearest edge location, then travels on AWS's private network to S3 β€” which is faster and more reliable than routing over the public internet for long distances.

⚑

When Transfer Acceleration Helps

  • Users uploading from distant geographies (EU β†’ us-east-1)
  • Large file uploads over high-latency internet connections
  • Consistent performance from multiple global locations to one bucket
  • Can provide 50–500% speed improvement over direct upload
⚠️

When It Does NOT Help

  • Uploads from within the same region as the bucket
  • Small files β€” overhead of edge routing is not worth it
  • Adds per-GB transfer cost on top of standard S3 pricing
  • Test with the S3 Transfer Acceleration Speed Comparison tool first
Transfer Acceleration vs Direct Upload β€” Global Routing
πŸ‘€ USER (EU) Direct upload β€” long public internet hop β€” slower + unreliable CloudFront Edge (EU PoP) fast AWS private backbone β€” optimized routing S3 us-east-1
S3 Select & Glacier Select In-Depth

S3 Select allows you to retrieve only the subset of data you need from an object using SQL expressions β€” without downloading the entire file. Instead of downloading a 5 GB CSV and filtering locally, S3 filters on the server and returns only matching rows.

πŸ”

How It Works

  • Supported formats: CSV, JSON, Parquet
  • Optional compression: GZIP, BZIP2
  • Run SQL SELECT and WHERE against the object server-side
  • S3 returns only matching rows β€” not the full file
  • Reduces data transfer cost and client-side processing time
πŸ’°

Why It Matters

  • A 5 GB CSV with 10 matching rows β†’ transfer 10 rows, not 5 GB
  • Faster for Lambda functions operating on large S3 files
  • Glacier Select brings the same capability to archived data
  • Not a replacement for Athena β€” no joins, no aggregations
Byte-Range Fetches In-Depth

You can retrieve specific byte ranges of an object using the HTTP Range header. This enables parallel downloads and efficient partial reads without fetching the entire object.

⬇️

Parallel Download

Split a 10 GB object into 10 Γ— 1 GB ranges. Download all 10 in parallel. Combine client-side. Significantly faster than a single sequential download.

πŸ“‹

Read Header Only

Fetch just the first few KB of a file to read its header metadata (e.g., Parquet footer, image EXIF). Avoid downloading 500 MB to read 4 KB of metadata.

πŸ”„

Resume Downloads

If a download fails mid-way, resume from the last successful byte. No need to restart from zero for large objects.

πŸ‘‰ Key Takeaway

S3 scales to any throughput automatically β€” but you must spread load across prefixes to use it. Use Multipart Upload for anything above 100 MB. Use Transfer Acceleration for global users. Use S3 Select to minimize data transfer on large objects.

πŸ“‹ Chapter 5 β€” Summary
  • Baseline rates: 3,500 PUT/sec and 5,500 GET/sec per prefix. Spread load across prefixes to scale linearly.
  • Prefix partitioning: hash-based prefixes distribute requests across S3 partitions. 10 prefixes = 10Γ— throughput.
  • Multipart Upload: required above 5 GB, recommended above 100 MB. Parallel parts + resume on failure. Set lifecycle rule to abort incomplete uploads.
  • Transfer Acceleration: edge location β†’ AWS backbone β†’ S3. 50–500% faster for distant geographies. Adds per-GB cost.
  • S3 Select: server-side SQL filter on CSV/JSON/Parquet. Transfer only matching rows. Not a query engine β€” no joins.
  • Byte-range fetches: parallel downloads, header-only reads, and resume support via HTTP Range header.
06
Chapter Six

Cost Optimization

What You Pay For Core

S3 has no up-front cost and no minimum fee. You pay only for what you use across four dimensions:

πŸ’Ύ

Storage Cost

  • Per GB stored per month
  • Varies by storage class β€” Standard is most expensive, Deep Archive cheapest
  • Billed by actual bytes β€” fractional GBs charged proportionally
  • Versioned objects: every version is billed separately
πŸ”

Request & Retrieval Cost

  • PUT/COPY/POST/LIST: ~$0.005 per 1,000 requests
  • GET/SELECT: ~$0.0004 per 1,000 requests
  • Retrieval fee for IA, Glacier classes (per GB retrieved)
  • Lifecycle transition requests: small per-object fee
🌐

Data Transfer Cost

  • Inbound (upload to S3): free
  • S3 β†’ internet: ~$0.09/GB (first 10 TB/month)
  • S3 β†’ same-region EC2: free
  • S3 β†’ different region (CRR): ~$0.02/GB
  • S3 β†’ CloudFront: free (use CF to avoid egress)
βš™οΈ

Management & Features

  • S3 Inventory reports: per million objects listed
  • S3 Analytics (Storage Class Analysis): per million objects
  • Replication: per-GB data transfer + request fees
  • Transfer Acceleration: additional per-GB fee
Storage Class Cost Comparison In-Depth

These are approximate US East (N. Virginia) prices to illustrate relative costs. Always check the AWS pricing page for current rates in your region.

Storage ClassStorage ($/GB/month)Retrieval ($/GB)Min Storage DurationMin Object Size
S3 Standard~$0.023FreeNoneNone
S3 Intelligent-Tiering~$0.023 (frequent tier)FreeNone128 KB (smaller = Standard)
S3 Standard-IA~$0.0125~$0.01/GB30 days128 KB billed minimum
S3 One Zone-IA~$0.01~$0.01/GB30 days128 KB billed minimum
S3 Glacier Instant~$0.004~$0.03/GB90 days128 KB billed minimum
S3 Glacier Flexible~$0.0036~$0.01–0.03/GB90 days40 KB billed minimum
S3 Glacier Deep Archive~$0.00099~$0.02/GB180 days40 KB billed minimum

πŸ‘‰ Minimum storage duration traps are real. If you store a 1 GB file in Standard-IA for only 10 days and delete it, you are still billed for 30 days. Do not use IA classes for short-lived or frequently changed objects.

S3 Intelligent-Tiering In-Depth

Intelligent-Tiering (INT) automatically moves objects between access tiers based on actual usage β€” no retrieval fees, no lifecycle rules to manage. It is the right choice when access patterns are unknown or unpredictable.

πŸ€–

How It Works

  • Objects start in the Frequent Access tier (same cost as Standard)
  • Move to Infrequent Access tier after 30 days of no access
  • Move to Archive Instant tier after 90 days (optional)
  • Move to Archive tier after 90–180 days (optional, configure)
  • Object accessed β†’ immediately moved back to Frequent Access tier
πŸ’°

Cost Considerations

  • Small monitoring fee per object per month (~$0.0025/1,000 objects)
  • Objects smaller than 128 KB are billed as Standard β€” not worth INT
  • No retrieval fees within Frequent and Infrequent tiers
  • Archive tiers have retrieval fees (like Glacier)
  • No minimum storage duration β€” no early deletion penalty
Cost Optimization Strategies Core
πŸ“‰

Reduce Storage Cost

  • Set lifecycle rules to transition to cheaper classes automatically
  • Enable S3 Analytics to identify infrequently accessed data
  • Use Intelligent-Tiering for data with unknown access patterns
  • Expire old object versions automatically with lifecycle rules
  • Abort incomplete multipart uploads (lifecycle rule after 7 days)
  • Compress files before upload (GZIP, Snappy, ZSTD)
πŸ“‘

Reduce Transfer Cost

  • Serve S3 content via CloudFront β€” S3β†’CF is free, CFβ†’internet is cheaper
  • Keep compute (EC2/Lambda) in the same region as S3 β€” free transfer
  • Use VPC Gateway Endpoints β€” free S3 access from within VPC
  • Use S3 Select to transfer only needed rows, not full objects
  • Enable Requester Pays for public datasets β€” consumer pays retrieval
S3 Storage Lens In-Depth

S3 Storage Lens provides org-wide visibility into S3 usage, activity trends, and cost optimization recommendations across all buckets and accounts in your AWS Organization.

πŸ“Š

Usage Metrics

Total storage bytes, object count, average object size, incomplete multipart uploads β€” aggregated across your entire organization.

πŸ“ˆ

Activity Metrics

GET/PUT/DELETE request counts, bytes downloaded. Identify hot buckets and cold buckets that should be transitioned to cheaper storage classes.

πŸ’‘

Recommendations

S3 Storage Lens surfaces cost optimization tips: objects that qualify for lifecycle transitions, buckets with no lifecycle rules, and incomplete multipart upload accumulation.

Cost Decision Framework Core
ScenarioRight ChoiceReason
Frequently accessed app dataS3 StandardNo retrieval fee, no min duration
Access pattern is unknownS3 Intelligent-TieringAuto-optimizes without lifecycle rules
Backup accessed once/monthS3 Standard-IA50% cheaper storage, low retrieval frequency
Replicated data (can re-create)S3 One Zone-IA20% cheaper than Standard-IA, acceptable single-AZ risk
Compliance archive, instant accessS3 Glacier Instant~83% cheaper than Standard, ms retrieval
7+ year regulatory archiveS3 Glacier Deep Archive~96% cheaper than Standard, 12h retrieval acceptable
Short-lived temp files (<30 days)S3 StandardIA min-duration billing makes IA more expensive
πŸ‘‰ Key Takeaway

S3 cost optimization is primarily about storage class selection and lifecycle automation. Serve via CloudFront to eliminate egress. Set lifecycle rules on day one β€” retroactively optimizing storage is expensive and slow. Use Storage Lens to find what you missed.

πŸ“‹ Chapter 6 β€” Summary
  • Four cost dimensions: storage ($/GB/month), requests (per 1,000), retrieval ($/GB for IA/Glacier), data transfer (free inbound, ~$0.09/GB egress).
  • S3β†’CloudFront is free. Use CloudFront for public content β€” eliminates S3 egress cost entirely.
  • Minimum duration traps: Standard-IA = 30 days, Glacier = 90 days, Deep Archive = 180 days. Don't use IA for short-lived objects.
  • Intelligent-Tiering: auto-moves objects based on actual access. No retrieval fee. Best for unknown access patterns. Objects <128 KB billed as Standard.
  • Lifecycle rules: set on day one. Expire old versions. Abort incomplete multipart uploads. Transition logs to Glacier after 90 days.
  • VPC Gateway Endpoint: free S3 access from within a VPC. Eliminates NAT Gateway data processing costs for S3 traffic.
  • Storage Lens: org-wide dashboard for usage, activity, and automatic cost optimization recommendations.
07
Chapter Seven

Architecture Patterns

Pattern 1 β€” Static Website Hosting Introductory

S3 can serve HTML, CSS, JavaScript, and image files directly as a website β€” no web server, no EC2, no maintenance. For read-heavy static content, this is the simplest and cheapest architecture on AWS.

πŸ—οΈ

Architecture

  • Enable static website hosting on the S3 bucket
  • Set index document (index.html) and error document (404.html)
  • Bucket policy grants s3:GetObject to * (public read)
  • Disable Block Public Access to allow the public policy
  • Use custom domain via Route 53 CNAME or alias
βœ…

With CloudFront (Recommended)

  • CloudFront distribution in front of S3 origin
  • S3 bucket stays private β€” CloudFront uses OAC to access it
  • HTTPS via ACM certificate on CloudFront (S3 website endpoint is HTTP only)
  • Global edge caching β€” serves from PoP nearest to user
  • Eliminates S3 egress cost β€” S3β†’CF transfer is free
Static Website β€” S3 + CloudFront Architecture
πŸ‘€ USER Route 53 DNS + alias CloudFront HTTPS + ACM OAC S3 Bucket Private Β· HTML/CSS/JS βœ“ HTTPS enforced βœ“ Global edge cache βœ“ S3β†’CF free βœ“ Bucket private Scales to any traffic Β· No server to manage Β· ~$0/month for low-traffic sites
Pattern 2 β€” User Upload with Presigned URL In-Depth

Users upload files directly to S3, bypassing your application server entirely. Your backend generates a short-lived presigned URL and returns it to the client. The client uploads directly to S3 β€” your server never touches the bytes.

πŸ—οΈ

Flow

  • Client requests upload permission from your API
  • Your API generates a presigned PUT URL (e.g., 15 minutes)
  • API returns the presigned URL to the client
  • Client uploads the file directly to S3 using the URL
  • S3 sends an event notification to Lambda on completion
  • Lambda processes the uploaded file (resize, scan, index)
βœ…

Benefits

  • Your servers handle zero upload bandwidth
  • Files go directly to S3 β€” faster for users on high-bandwidth connections
  • Bucket stays private β€” presigned URL grants temporary access only
  • Lambda event trigger enables automatic downstream processing
  • Scales to thousands of concurrent uploads without bottleneck
Presigned URL Upload Pattern β€” Client β†’ API β†’ S3 Direct
πŸ‘€ CLIENT β‘  GET /upload-url Your API Presigned URL (15 min TTL) β‘‘ presigned URL β‘’ Direct PUT to S3 (file bytes β€” bypasses API) S3 Bucket Private β‘£ Lambda Resize / Scan API server handles zero upload bytes Β· S3 scales to any number of concurrent uploads
Pattern 3 β€” Data Lake Architecture In-Depth

S3 is the standard storage layer for data lakes on AWS. Raw data lands in S3, is catalogued with Glue, and queried in-place with Athena β€” no database to provision, no ETL until you need it.

πŸ—οΈ

Architecture Layers

  • Landing zone: raw data as-is β€” JSON, CSV, logs, API dumps
  • Processed zone: cleaned, partitioned Parquet files (columnar format)
  • Curated zone: aggregated, business-ready datasets
  • Each zone is a separate S3 prefix or bucket
  • AWS Glue Crawlers auto-discover schema and update the Glue Catalog
  • Athena queries directly against Parquet files using SQL
πŸ’‘

Why This Pattern Works

  • Storage is decoupled from compute β€” scale each independently
  • Pay per query with Athena β€” no always-on database cluster
  • Parquet columnar format reduces Athena scan cost by 10–100Γ—
  • Partition by date/region β€” Athena skips irrelevant partitions entirely
  • Lake Formation adds fine-grained table/column access control
Pattern 4 β€” Backup & Disaster Recovery Core
πŸ’Ύ

Database Backups

  • RDS automated backups export to S3
  • DynamoDB exports to S3 (point-in-time)
  • EC2 snapshots stored via EBS then exported to S3
  • Lifecycle: transition to Glacier after 30 days
🌍

Cross-Region DR

  • Enable CRR to a secondary region bucket
  • RPO: near-zero (async replication, seconds lag)
  • RTO: immediate β€” data already in secondary region
  • S3 Object Lock protects against ransomware
πŸ”„

Versioning for Recovery

  • Versioning = built-in point-in-time recovery
  • Restore any object to any previous state
  • Lifecycle rules expire old versions to control cost
  • MFA Delete for extra protection on versioned buckets
Pattern 5 β€” Event-Driven Processing Pipeline In-Depth

S3 events drive serverless processing pipelines β€” no polling, no scheduler, no idle workers. Every object upload automatically triggers the next stage of processing.

Event-Driven Pipeline β€” S3 Upload Triggers Processing Chain
S3 Raw data/raw/ ObjectCreated Lambda Validate + Transform S3 Processed data/parquet/ ObjectCreated SQS Queue Buffer + Decouple Downstream Athena / RDS API / Search No polling Β· No idle workers Β· Each stage scales independently Β· SQS buffers volume spikes
Pattern 6 β€” Common Mistakes Introductory
MistakeWhy It's BadFix
Making bucket public for all contentExposes all objects β€” including future uploadsKeep bucket private, use CloudFront OAC + presigned URLs
No lifecycle rulesStorage cost grows unbounded over months/yearsSet lifecycle rules on day one for every bucket
Using S3 as a databaseNo query capability, no indexing β€” extremely slow lookupsStore metadata in DynamoDB/RDS, store files in S3
Ignoring incomplete multipart uploadsParts accumulate silently and are billed indefinitelyLifecycle rule: abort incomplete multipart after 7 days
Moving to IA too aggressivelyMin-duration billing + retrieval fees make it more expensive for frequent accessUse S3 Analytics or Intelligent-Tiering to identify true access patterns
Not enabling versioning on important bucketsOne accidental delete or overwrite = permanent data lossEnable versioning + lifecycle expire old versions
Storing credentials in S3 objectsExposed if bucket is ever misconfiguredUse Secrets Manager or SSM Parameter Store
πŸ‘‰ Key Takeaway

S3's patterns all follow one principle: S3 is storage, not compute. Let CloudFront serve it, let Lambda process it, let Athena query it, let your API control access to it. S3 itself just stores β€” everything else is glue.

πŸ“‹ Chapter 7 β€” Summary
  • Static website: S3 + CloudFront + ACM + Route 53. Bucket stays private. OAC grants CloudFront access. Zero server cost.
  • User uploads: API generates presigned PUT URL β†’ client uploads directly to S3 β†’ Lambda processes on event. Your server handles zero bytes.
  • Data lake: Landing (raw) β†’ Processed (Parquet) β†’ Curated zones in S3. Glue Catalog autodiscovers schema. Athena queries in-place.
  • Backup & DR: CRR to secondary region. Versioning for point-in-time recovery. Object Lock for ransomware protection.
  • Event-driven pipeline: ObjectCreated β†’ Lambda β†’ processed S3 β†’ SQS β†’ downstream. No polling, scales to any volume.
  • Common mistakes: no lifecycle rules, ignoring incomplete multipart uploads, moving data to IA too aggressively, no versioning on critical buckets.