Amazon S3 β
Simple Storage Service
Unlimited object storage in the cloud. The backbone of data lakes, backups, static websites, and CDN origins β infinitely scalable, 11 nines durable.
β‘ S3 in 30 Seconds
- Object storage β store any file of any size, retrieved by a unique key (URL)
- Unlimited capacity β no pre-provisioning, no disk management
- 99.999999999% (11 nines) durability β data replicated across β₯3 AZs automatically
- Multiple storage classes β optimize cost from millisecond access to archival
- Integrated with almost every AWS service β the default data layer for AWS
What is S3
Amazon S3 (Simple Storage Service) is AWS's object storage service. It lets you store and retrieve any amount of data β files, images, videos, backups, logs, ML datasets β from anywhere on the internet. Unlike a hard drive with folders and files, S3 stores data as objects inside buckets, each identified by a unique key.
π Think of S3 as: An infinite hard drive in the cloud β pay only for what you store, access from anywhere
S3 was one of the first AWS services, launched in 2006. Today it stores trillions of objects and handles millions of requests per second across AWS customers. It is the most-used AWS storage service and the foundation of most data architectures on AWS.
Traditional File Storage Problems
- Fixed disk capacity β buy hardware before you need it
- Disks fail β complex RAID and backup setups required
- Not globally accessible β VPN or network share required
- Scaling is slow β days/weeks to add capacity
- High upfront capital cost
S3 Solves
- Unlimited capacity β grows automatically with your data
- AWS manages replication and durability β 11 nines
- Accessible over HTTPS from anywhere, any device
- Add storage instantly β no provisioning required
- Pay per GB stored + requests β no upfront cost
Understanding the storage type is critical for choosing the right service:
| Type | How It Works | AWS Service | Best For |
|---|---|---|---|
| Object Storage | Flat namespace β key β object. No folders. Access via HTTP. | S3 | Files, images, backups, data lakes, logs |
| Block Storage | Raw disk blocks. OS mounts it like a hard drive. Low latency. | EBS | Databases, boot volumes, OS-level read/write |
| File Storage | Shared filesystem with directories. NFS protocol. | EFS | Shared access across multiple EC2 instances |
π S3 is not a filesystem. You cannot "mount" S3 like a drive or run a database on it. It is optimized for storing and retrieving whole objects via HTTP β not for random read/write of small byte ranges.
S3 is referenced by almost every AWS service:
Data & Analytics
Data lakes (Athena, Glue, Redshift Spectrum). S3 is the raw storage layer β query data in-place without loading into a database.
Web & Applications
Static website hosting (HTML/CSS/JS), user uploads, media assets, and application configuration files stored in S3.
DevOps & Infrastructure
CloudFormation templates, Lambda deployment packages, CodePipeline artifacts, EC2 AMI snapshots β all stored in S3.
Backup & Compliance
AWS Backup destinations, CloudTrail audit logs, VPC flow logs, config history, and compliance archives all land in S3.
Machine Learning
SageMaker training datasets, model artifacts, and inference results. S3 is the default ML data store on AWS.
CDN Origin
CloudFront uses S3 as an origin to cache and serve content globally with low latency β the standard pattern for static assets.
Think of S3 like a post office with infinite numbered mailboxes:
The Post Office = Bucket
- A named container for objects
- Name must be globally unique across all AWS accounts
- Lives in one AWS region β data does not leave unless you replicate
- You own and control the bucket policies and access
- Up to 100 buckets per account (soft limit, can be raised)
The Package = Object
- Any file β image, video, CSV, zip, binary, JSON
- Up to 5 TB per object (use Multipart Upload above 100 MB)
- Identified by a unique key (like a full file path)
- Includes metadata: content-type, custom tags, system attributes
- Immutable β to update, you replace the entire object
Two different guarantees β both important, often confused on the exam:
Durability β 99.999999999%
- Will your data survive? β yes, 11 nines
- AWS stores multiple copies across β₯3 AZs automatically
- Designed to tolerate concurrent loss of data in 2 facilities
- Losing stored data in S3 Standard is essentially impossible
- Same for all storage classes except S3 One Zone-IA (single AZ)
Availability β 99.99%
- Can you access it right now? β 99.99% of the time
- ~52 minutes downtime per year on S3 Standard
- Varies by storage class β S3-IA = 99.9%, One Zone-IA = 99.5%
- Glacier availability is lower β retrieval takes minutes to hours
| Use Case | How S3 Is Used | Why It Works |
|---|---|---|
| Static Website Hosting | Serve HTML/CSS/JS from a bucket with public access | No server needed β scales to any traffic automatically |
| Database Backups | Dump files pushed to S3 on a schedule | Cheap, durable, cross-region replication available |
| User Uploads | Presigned URLs let users upload directly to S3 | Bypass your app server for large files |
| Data Lake | Raw data (JSON, Parquet, CSV) stored in S3, queried with Athena | Decouple storage from compute β pay per query |
| Log Archive | CloudTrail, ALB access logs, VPC flow logs β S3 | Long-term storage, lifecycle to Glacier after 90 days |
| CDN Origin | CloudFront serves from S3 origin globally | Edge caching + S3 durability = best of both worlds |
Since December 2020, Amazon S3 provides strong read-after-write consistency for all operations β at no additional cost and with no performance impact. This was a major change from S3's original eventual consistency model.
Current Behavior (Strong Consistency)
- PUT a new object β immediately readable by all subsequent GETs
- Overwrite an existing object β next GET returns the new version
- DELETE an object β next GET returns 404
- LIST operations reflect the latest state
- Applies to all storage classes, all regions
Old Behavior (Pre-2020 β No Longer Applies)
- New objects: read-after-write consistent (same as now)
- Overwrites and deletes: eventually consistent β you might read stale data
- LIST after PUT: object might not appear immediately
- This is in many older study guides β it is outdated
π Exam note: S3 is now strongly consistent for all operations. If a question references eventual consistency for S3, the correct answer is strong read-after-write consistency. Older materials mentioning eventual consistency for overwrites are outdated.
S3 is unlimited, durable object storage β the default data layer for AWS. If you need to store a file in AWS, S3 is the answer 90% of the time.
- Object storage β files stored as objects with a unique key, retrieved via HTTP. Not a filesystem or database.
- Unlimited capacity β no pre-provisioning. Pay per GB stored + per request made.
- 11 nines durability β data replicated across β₯3 AZs automatically. AWS manages it.
- Object vs Block vs File: S3 = objects (HTTP). EBS = block (disk). EFS = file (NFS mount).
- Strong consistency: all operations (PUT, DELETE, LIST) are strongly consistent since 2020. No eventual consistency.
- Used everywhere: backups, data lakes, static websites, ML datasets, CDN origin, DevOps artifacts.
- Durability β Availability: 11 nines = data won't disappear. 99.99% = you can access it almost always.
Core Concepts & Storage Model
A bucket is the top-level container for objects in S3. Every object lives inside a bucket. Buckets are created in a specific AWS region and data does not leave that region unless you explicitly configure replication.
Globally Unique Name
Bucket names must be unique across all AWS accounts globally β not just your account. If my-company-data is taken by anyone in the world, you cannot use it.
Regional Resource
A bucket is created in one region (e.g., us-east-1). Choose the region closest to your users or compute workload to minimize latency and data transfer costs.
Naming Rules
- 3β63 characters long
- Lowercase letters, numbers, hyphens only
- Cannot start or end with a hyphen
- Cannot be formatted as an IP address
An object is the fundamental unit of data in S3. It consists of the data itself plus metadata. Every object is identified by a key β a string that uniquely identifies the object within its bucket.
| Component | What It Is | Example |
|---|---|---|
| Key | The full "path" of the object within the bucket | images/2026/logo.png |
| Value | The actual data β any bytes, any format | Binary PNG file data |
| Version ID | Unique ID per version (when versioning is enabled) | ab3c4de5fg6h |
| Metadata | Key-value pairs describing the object | Content-Type: image/png |
| Tags | User-defined labels for cost allocation or access control | env=prod, team=frontend |
| ETag | MD5 hash of the object β used to verify integrity | d41d8cd98f00b204e9800998ecf8427e |
π S3 has no real folders β the key images/2026/logo.png is just a string. The AWS console displays the slash as a folder, but it is purely cosmetic. This matters for prefix-based performance optimization.
Single PUT Upload
Max 5 GB per PUT request. For anything larger, use Multipart Upload. AWS recommends Multipart for objects above 100 MB.
Multipart Upload
Split large files into parts (min 5 MB, max 10,000 parts). Upload parts in parallel. Combine on S3. Required for objects above 5 GB.
Maximum Object Size
A single object can be up to 5 TB. No limit on bucket total size β store petabytes in one bucket if needed.
S3 offers multiple storage classes, each optimized for different access frequency and cost profiles. You pay less per GB for classes you access less frequently β but you pay a retrieval fee when you do access them.
| Storage Class | Access Pattern | Availability | Retrieval Fee | Best For |
|---|---|---|---|---|
| S3 Standard | Frequent access | 99.99% | None | Active data, websites, apps |
| S3 Intelligent-Tiering | Unknown / changing | 99.9% | None | Data with unpredictable patterns |
| S3 Standard-IA | Infrequent (monthly) | 99.9% | Per GB retrieved | Backups, disaster recovery |
| S3 One Zone-IA | Infrequent, single AZ | 99.5% | Per GB retrieved | Re-creatable data, secondary backups |
| S3 Glacier Instant | Rare (quarterly) | 99.9% | Per GB retrieved | Archive with instant access |
| S3 Glacier Flexible | Rare β minutes to hours | 99.99% | Per GB + request | Compliance archives, tape replacement |
| S3 Glacier Deep Archive | Very rare β 12h retrieval | 99.99% | Per GB + request | 7β10 year regulatory archives |
Versioning keeps multiple versions of an object in the same bucket. Every time you overwrite or delete an object, S3 creates a new version instead of destroying the old one.
Why Enable Versioning
- Recover from accidental overwrites and deletes
- Required prerequisite for S3 Replication
- Required for S3 Object Lock (compliance)
- Enables audit trail β who changed what, when
- Deletes create a "delete marker" β data is still there
Versioning Trade-offs
- Storage cost grows β every version is billed separately
- Once enabled, cannot be fully disabled β only suspended
- Need lifecycle rules to expire old versions automatically
- Deleting a versioned object requires deleting ALL versions
System Metadata
- Set by AWS β
Content-Type,Content-Length,Last-Modified Content-Typeis critical β browsers use it to render objects correctly- Set at upload time, cannot always be changed retroactively
User-Defined Tags
- Up to 10 key-value pairs per object
- Used for cost allocation reports (group by team, env, project)
- Used in lifecycle rules β apply rules to tagged objects only
- Used in IAM/bucket policies β grant access based on tags
Understanding request types matters for cost calculation β you pay per request:
| Request Type | Operation | Relative Cost |
|---|---|---|
| PUT / COPY / POST / LIST | Write or list operations | Higher ($0.005 per 1,000) |
| GET / SELECT | Read object data | Lower ($0.0004 per 1,000) |
| DELETE | Delete object | Free |
| Lifecycle transitions | Move object between storage classes | Per-transition fee |
S3's storage model is simple: buckets hold objects, objects have keys and metadata. The storage class you choose determines cost and access speed β match it to how frequently you access the data.
- Buckets β globally unique named containers, tied to one region. Up to 100 per account (soft limit).
- Objects β data + metadata + tags. Max 5 TB. Use Multipart Upload above 100 MB.
- Keys β the full "path" string identifying an object. No real folders β slashes are cosmetic.
- Storage classes β Standard (frequent) β IA (monthly) β Glacier (rare) β Deep Archive (years). Lower cost = retrieval fee.
- Versioning β keeps all versions on overwrite/delete. Enables recovery. Required for replication and Object Lock.
- Metadata & Tags β Content-Type is critical. Tags drive cost allocation, lifecycle rules, and access control.
Security & Access Control
By default, all S3 buckets and objects are private. Nothing is publicly accessible unless you explicitly allow it. Access to S3 is controlled through multiple overlapping layers β understanding which layer applies when is the key to both security and the SAA-C03 exam.
IAM Policies
Attached to users, groups, or roles. Define what AWS identities can do to S3. Evaluated by IAM before the request even reaches S3.
Bucket Policies
Attached to the bucket itself. Resource-based policy in JSON. Can grant access to other AWS accounts, services, and the public. Most powerful S3 access tool.
ACLs (Legacy)
Object or bucket-level access control lists. Predates IAM. AWS recommends disabling ACLs and using bucket policies instead. Still appears on exams.
IAM policies grant S3 permissions to AWS identities. The identity must have permissions AND the bucket policy must allow (or at least not deny) the request.
| IAM Action | What It Allows |
|---|---|
s3:GetObject | Download / read an object |
s3:PutObject | Upload / write an object |
s3:DeleteObject | Delete an object |
s3:ListBucket | List objects in a bucket |
s3:GetBucketPolicy | Read the bucket policy |
s3:PutBucketPolicy | Write / replace the bucket policy |
s3:* | Full access to all S3 actions (admin) |
Bucket policies are JSON documents attached directly to a bucket. They can grant or deny access to specific AWS accounts, IAM users/roles, services, or the public. They are the primary mechanism for cross-account access and public access.
Common Bucket Policy Use Cases
- Grant another AWS account read access to a bucket
- Force all uploads to use HTTPS (deny HTTP)
- Allow CloudFront OAC to read from a private bucket
- Restrict access to specific IP address ranges
- Require server-side encryption on all PUT requests
- Make a bucket publicly readable for static website hosting
Policy Structure
- Effect β Allow or Deny
- Principal β who (IAM user, account,
*for public) - Action β what S3 operations (
s3:GetObject) - Resource β which bucket/object (
arn:aws:s3:::my-bucket/*) - Condition β optional constraints (IP, MFA, HTTPS)
Block Public Access is a safety switch that sits above bucket policies and ACLs. Even if your bucket policy grants public access, Block Public Access will override and deny it.
What It Does
- 4 independent settings that can be toggled on/off
- Can be set at account level (all buckets) or per bucket
- Account-level setting overrides bucket-level
- Enabled by default on all new buckets since 2023
- Protects against misconfigured bucket policies accidentally exposing data
When to Disable
- Static website hosting that needs to be publicly readable
- Public software distribution buckets
- Any intentional public access scenario
- Must be explicitly and deliberately turned off β never by accident
S3 Access Points simplify managing access to shared datasets in S3. Instead of one complex bucket policy that handles every application, each application gets its own named endpoint with its own access policy β scoped to exactly what it needs.
How It Works
- Each access point has a unique DNS name (endpoint)
- Each has its own IAM-style policy for permissions
- Multiple access points on one bucket β one per app/team
- Access point ARN used in place of bucket ARN
VPC-Restricted Access Points
- Access point can be restricted to a specific VPC
- Requests from outside the VPC are automatically denied
- No need for complex bucket policy VPC conditions
- Combines with VPC Endpoints for fully private access
When to Use
- Data lake: different teams query different prefixes
- Multi-tenant: each tenant's app gets scoped access
- Compliance: audit access per application
- At scale: 10,000 access points per bucket supported
A VPC Gateway Endpoint allows EC2 instances and other resources in a private subnet to access S3 without going through the internet β no NAT Gateway, no Internet Gateway, no public IP required.
How It Works
- Create a Gateway Endpoint for S3 in your VPC
- Attach route table entries directing S3 traffic to the endpoint
- Traffic to S3 stays on the AWS private network β never touches the internet
- Free β no hourly charge, no data processing charge
- Works with bucket policies: add
aws:sourceVpcecondition to restrict access to endpoint only
Benefits
- Security: data never traverses the public internet
- Cost: no NAT Gateway data processing fees (saves $0.045/GB)
- Performance: lower latency, higher throughput within AWS
- Exam: "How to access S3 from a private subnet securely" β Gateway Endpoint
π Exam tip: S3 and DynamoDB use Gateway Endpoints (free, route table-based). Most other AWS services use Interface Endpoints (ENI-based, hourly charge). "Private S3 access from a private subnet" β VPC Gateway Endpoint β this appears on nearly every AWS exam.
A presigned URL grants temporary access to a private S3 object without making the bucket public. Any identity with the right IAM permissions can generate one.
How It Works
S3 embeds the credentials and expiry time into the URL itself. The URL is signed with the creator's AWS credentials. Anyone with the URL can access the object until it expires.
Use Cases
- User downloads a private file from your app
- User uploads directly to S3 without credentials
- Sharing a large file temporarily
- Email attachment links that expire
Expiry
- Default: 1 hour
- Max: 7 days (with STS temp credentials)
- URL becomes invalid after expiry β no revocation needed
- Revoke early by invalidating the signing credentials
S3 supports encryption at rest and in transit. Since January 2023, all new objects are encrypted by default with SSE-S3.
| Type | Key Management | Use Case | Exam Note |
|---|---|---|---|
| SSE-S3 | AWS manages keys completely | Default β zero management overhead | Header: x-amz-server-side-encryption: AES256 |
| SSE-KMS | AWS KMS β you control the key policy | Compliance, audit trail, cross-account control | CloudTrail logs every key usage. Adds KMS API call cost. |
| SSE-C | You provide the key on every request | You manage keys outside AWS completely | AWS never stores your key β must be sent with every PUT/GET. |
| Client-side | You encrypt before upload | Zero trust β AWS never sees plaintext | Application owns the full encryption lifecycle. |
Encryption in Transit
- All S3 endpoints support HTTPS (TLS 1.2+)
- HTTP requests are also accepted by default β unless you deny them
- Force HTTPS with a bucket policy condition:
aws:SecureTransport: false β Deny - HTTPS is always recommended β required for compliance workloads
SSE-KMS Considerations
- Every S3 GET/PUT = a KMS API call (GenerateDataKey / Decrypt)
- KMS has request rate limits β heavy S3 workloads can hit KMS throttling
- Use KMS key policies to restrict who can use the key
- Audit all data access via CloudTrail β every decrypt is logged
Access Control Lists (ACLs) are the original S3 access mechanism. AWS now recommends disabling ACLs and using bucket policies instead. However, ACLs still appear on certifications.
| ACL Permission | What It Allows |
|---|---|
| READ | List objects (bucket) or download object |
| WRITE | Upload/delete objects in bucket |
| READ_ACP | Read the ACL itself |
| WRITE_ACP | Modify the ACL |
| FULL_CONTROL | All of the above |
π New accounts have ACLs disabled by default. Use bucket policies for access control β they are more expressive, easier to audit, and don't require understanding legacy ACL semantics.
Bucket Hardening
- Enable Block Public Access at account level
- Enable versioning β protects against ransomware and accidental deletes
- Enable Object Lock for compliance data (WORM)
- Require SSE-KMS for sensitive data via bucket policy
- Enable S3 access logging β record all requests to the bucket
IAM & Network
- Use IAM roles β never hardcode credentials in apps
- Apply least-privilege policies β grant only the actions needed
- Use VPC Endpoints (Gateway type) for private access from EC2
- Force HTTPS with
aws:SecureTransportdeny condition - Enable AWS Macie for sensitive data discovery (PII detection)
S3 security has three layers: IAM (who can act), Bucket Policy (what the bucket allows), and Block Public Access (safety override). All three must align for access to succeed. When in doubt, Block Public Access wins.
- Default private β all buckets and objects are private. Nothing is public unless you explicitly allow it.
- IAM Policies β attached to identities. Control what users/roles can do in S3.
- Bucket Policies β attached to the bucket. JSON resource policy. Best for cross-account and public access.
- Block Public Access β account or bucket-level override. Enabled by default. Always overrides bucket policy.
- Access Points β per-application named endpoints with individual policies. VPC-restricted for private access. Scale to 10,000 per bucket.
- VPC Gateway Endpoint β private S3 access from VPC without internet. Free. Route table-based. Common exam topic.
- Presigned URLs β temporary signed URLs for private object access. Max 7 days. No bucket policy change needed.
- Encryption: SSE-S3 (default, AWS manages), SSE-KMS (audit trail, your key policy), SSE-C (you manage key).
- Force HTTPS via bucket policy. Disable legacy ACLs. Use VPC endpoints for private access.
Data Management & Lifecycle
Lifecycle rules automate the movement and deletion of objects over time. They eliminate the need to manually manage aging data β define the rules once, and S3 handles the transitions and expirations automatically.
Transition Actions
- Move objects to a cheaper storage class after N days
- Example: Standard β Standard-IA after 30 days
- Example: Standard-IA β Glacier after 90 days
- Example: Glacier β Deep Archive after 365 days
- Can be scoped to a prefix or object tags
Expiration Actions
- Delete objects after N days β automatic cleanup
- Delete expired delete markers (versioned buckets)
- Delete non-current versions after N days
- Abort incomplete multipart uploads after N days
- Prevents unbounded storage cost growth
S3 Replication automatically and asynchronously copies objects from one bucket to another. Versioning must be enabled on both source and destination buckets.
CRR β Cross-Region Replication
- Source and destination in different AWS regions
- Use case: disaster recovery across regions
- Use case: low-latency access from another geography
- Use case: compliance (data residency requirements)
- Incurs inter-region data transfer cost
SRR β Same-Region Replication
- Source and destination in the same AWS region
- Use case: copy data between accounts in the same region
- Use case: log aggregation from multiple source buckets
- Use case: test environment with live data copy
- No inter-region transfer cost
| Replication Behaviour | Detail |
|---|---|
| What replicates | New objects after replication is enabled. Existing objects need S3 Batch Replication. |
| Delete behaviour | Delete markers are NOT replicated by default (can be enabled). Permanent deletes never replicate. |
| Storage class | Destination uses same class by default. Can override to a cheaper class. |
| Ownership | Replicated objects are owned by source account by default. Use Object Ownership setting to change. |
| Chaining | Replication is not transitive β AβBβC does NOT automatically replicate A to C. |
| Encryption | SSE-S3 and SSE-KMS objects can be replicated. SSE-C objects cannot. |
β±οΈ Replication Time Control (RTC)
Standard replication is asynchronous with no SLA on timing β most objects replicate in seconds, but some may take hours. S3 Replication Time Control (RTC) guarantees that 99.99% of objects replicate within 15 minutes, with S3 metrics to track replication lag. Use RTC when you have compliance or disaster recovery requirements that demand a guaranteed replication SLA. RTC adds cost β enable it only for buckets where the timing guarantee matters.
Object Lock prevents objects from being deleted or overwritten for a defined period. It implements WORM (Write Once Read Many) storage β required for SEC 17a-4, HIPAA, and financial compliance workloads.
Retention Modes
- Compliance mode β nobody can delete or change the object, including the root user. Period cannot be shortened. Used for strict regulatory requirements.
- Governance mode β only users with
s3:BypassGovernanceRetentionpermission can override. Lighter enforcement for internal policies.
Legal Hold
- Prevents deletion independent of any retention period
- No expiry date β stays locked until explicitly removed
- Requires
s3:PutObjectLegalHoldpermission to apply/remove - Used during litigation β preserve evidence without a known end date
π Object Lock must be enabled when the bucket is created β it cannot be added to an existing bucket. Compliance mode retention periods cannot be shortened even by AWS Support.
S3 can publish events when objects are created, deleted, restored, or replicated. This enables event-driven architectures where downstream systems react to data changes automatically.
SNS
Fan out notification to multiple subscribers. Email alerts, SMS, or trigger multiple SQS queues from one S3 event.
SQS
Decouple processing from uploads. Workers poll SQS and process each uploaded object independently. Handles volume spikes gracefully.
Lambda
Trigger serverless processing immediately on upload. Image resizing, virus scanning, data validation, format conversion β all without a server.
| Event Type | Triggered When | Common Use |
|---|---|---|
s3:ObjectCreated:* | Any object is uploaded (PUT, POST, COPY, multipart) | Trigger processing pipeline on upload |
s3:ObjectRemoved:* | Object is deleted | Audit deletion, update downstream index |
s3:ObjectRestore:* | Glacier object restore initiated/completed | Notify when archive is available |
s3:Replication:* | Replication failure or missed threshold | Alert on replication health issues |
S3 Batch Operations runs large-scale jobs across billions of objects with a single API call. Instead of writing scripts to iterate through objects, you describe the operation and S3 runs it at scale.
Supported Operations
- Copy objects between buckets
- Replace object tags or ACLs
- Restore objects from Glacier
- Invoke Lambda on every object
- Replicate existing objects (Batch Replication)
- Set Object Lock retention on existing objects
How It Works
- Provide an object manifest (S3 Inventory report or CSV)
- Define the operation and parameters
- S3 processes all listed objects β tracks progress and errors
- Generates a completion report to S3
- Full audit trail in CloudTrail
Lifecycle rules + Replication + Object Lock form your data governance foundation. Automate transitions to save cost, replicate for resilience, and lock for compliance. Event notifications turn S3 into a trigger for your entire data pipeline.
- Lifecycle rules β automate transitions (Standard β IA β Glacier) and expirations. Scope by prefix or tag.
- CRR β cross-region replication for DR, compliance, and latency. Adds inter-region transfer cost.
- SRR β same-region replication for cross-account copy, log aggregation. No transfer cost.
- Replication nuances: versioning required, new objects only, delete markers not replicated by default, not transitive.
- Object Lock β WORM storage. Compliance mode = nobody can delete. Governance mode = privileged users can override. Must enable at bucket creation.
- Event notifications β S3 β SNS / SQS / Lambda on create/delete/restore. Foundation of event-driven data pipelines.
- Batch Operations β run jobs on billions of objects. Copy, tag, restore, invoke Lambda at scale.
Performance & Scaling
S3 scales automatically β there are no capacity limits to configure, no partitions to manage, and no pre-warming required. AWS manages the infrastructure horizontally behind the scenes. However, understanding S3's performance characteristics helps you avoid hitting rate limits on high-throughput workloads.
PUT / COPY / DELETE
3,500 requests/sec per prefix. Writing 100K objects/sec requires ~29 prefixes with evenly distributed keys.
GET / HEAD
5,500 requests/sec per prefix. A single prefix can serve ~5,500 reads per second before S3 automatically scales further.
No Hard Limits
These are baseline per-prefix rates. S3 will scale beyond these automatically as traffic increases β no pre-warming needed.
A prefix is the part of an object key before the final filename β essentially the "path". S3 uses prefixes to distribute requests across its internal infrastructure. More distinct prefixes = more parallelism = higher throughput.
Bad Pattern β Single Prefix
- All objects under
uploads/2026/ - All requests go to the same partition
- Hits 3,500 PUT/sec limit quickly
- No horizontal scaling benefit
Good Pattern β Multiple Prefixes
- Distribute across
a/uploads/b/uploads/c/uploads/ - Or use hash prefixes:
a3f/7b2/9d1/ - Each prefix gets its own 3,500/5,500 rate budget
- 10 prefixes = 35,000 PUT/sec, 55,000 GET/sec
Multipart Upload splits large objects into parts, uploads them in parallel, and reassembles them on S3. It is the correct mechanism for any object above 100 MB.
| Feature | Detail |
|---|---|
| Minimum part size | 5 MB (except the last part) |
| Maximum parts | 10,000 parts per object |
| Maximum object size | 5 TB (requires multipart) |
| Parallel uploads | Upload all parts simultaneously β dramatically faster on high-bandwidth connections |
| Resume on failure | Only the failed part needs to be retried β not the entire object |
| Incomplete uploads | Parts are billed even if never completed β use lifecycle rule to abort after N days |
Transfer Acceleration routes uploads through AWS CloudFront edge locations instead of going directly to the S3 regional endpoint. Data enters the AWS backbone at the nearest edge location, then travels on AWS's private network to S3 β which is faster and more reliable than routing over the public internet for long distances.
When Transfer Acceleration Helps
- Users uploading from distant geographies (EU β us-east-1)
- Large file uploads over high-latency internet connections
- Consistent performance from multiple global locations to one bucket
- Can provide 50β500% speed improvement over direct upload
When It Does NOT Help
- Uploads from within the same region as the bucket
- Small files β overhead of edge routing is not worth it
- Adds per-GB transfer cost on top of standard S3 pricing
- Test with the S3 Transfer Acceleration Speed Comparison tool first
S3 Select allows you to retrieve only the subset of data you need from an object using SQL expressions β without downloading the entire file. Instead of downloading a 5 GB CSV and filtering locally, S3 filters on the server and returns only matching rows.
How It Works
- Supported formats: CSV, JSON, Parquet
- Optional compression: GZIP, BZIP2
- Run SQL
SELECTandWHEREagainst the object server-side - S3 returns only matching rows β not the full file
- Reduces data transfer cost and client-side processing time
Why It Matters
- A 5 GB CSV with 10 matching rows β transfer 10 rows, not 5 GB
- Faster for Lambda functions operating on large S3 files
- Glacier Select brings the same capability to archived data
- Not a replacement for Athena β no joins, no aggregations
You can retrieve specific byte ranges of an object using the HTTP Range header. This enables parallel downloads and efficient partial reads without fetching the entire object.
Parallel Download
Split a 10 GB object into 10 Γ 1 GB ranges. Download all 10 in parallel. Combine client-side. Significantly faster than a single sequential download.
Read Header Only
Fetch just the first few KB of a file to read its header metadata (e.g., Parquet footer, image EXIF). Avoid downloading 500 MB to read 4 KB of metadata.
Resume Downloads
If a download fails mid-way, resume from the last successful byte. No need to restart from zero for large objects.
S3 scales to any throughput automatically β but you must spread load across prefixes to use it. Use Multipart Upload for anything above 100 MB. Use Transfer Acceleration for global users. Use S3 Select to minimize data transfer on large objects.
- Baseline rates: 3,500 PUT/sec and 5,500 GET/sec per prefix. Spread load across prefixes to scale linearly.
- Prefix partitioning: hash-based prefixes distribute requests across S3 partitions. 10 prefixes = 10Γ throughput.
- Multipart Upload: required above 5 GB, recommended above 100 MB. Parallel parts + resume on failure. Set lifecycle rule to abort incomplete uploads.
- Transfer Acceleration: edge location β AWS backbone β S3. 50β500% faster for distant geographies. Adds per-GB cost.
- S3 Select: server-side SQL filter on CSV/JSON/Parquet. Transfer only matching rows. Not a query engine β no joins.
- Byte-range fetches: parallel downloads, header-only reads, and resume support via HTTP Range header.
Cost Optimization
S3 has no up-front cost and no minimum fee. You pay only for what you use across four dimensions:
Storage Cost
- Per GB stored per month
- Varies by storage class β Standard is most expensive, Deep Archive cheapest
- Billed by actual bytes β fractional GBs charged proportionally
- Versioned objects: every version is billed separately
Request & Retrieval Cost
- PUT/COPY/POST/LIST: ~$0.005 per 1,000 requests
- GET/SELECT: ~$0.0004 per 1,000 requests
- Retrieval fee for IA, Glacier classes (per GB retrieved)
- Lifecycle transition requests: small per-object fee
Data Transfer Cost
- Inbound (upload to S3): free
- S3 β internet: ~$0.09/GB (first 10 TB/month)
- S3 β same-region EC2: free
- S3 β different region (CRR): ~$0.02/GB
- S3 β CloudFront: free (use CF to avoid egress)
Management & Features
- S3 Inventory reports: per million objects listed
- S3 Analytics (Storage Class Analysis): per million objects
- Replication: per-GB data transfer + request fees
- Transfer Acceleration: additional per-GB fee
These are approximate US East (N. Virginia) prices to illustrate relative costs. Always check the AWS pricing page for current rates in your region.
| Storage Class | Storage ($/GB/month) | Retrieval ($/GB) | Min Storage Duration | Min Object Size |
|---|---|---|---|---|
| S3 Standard | ~$0.023 | Free | None | None |
| S3 Intelligent-Tiering | ~$0.023 (frequent tier) | Free | None | 128 KB (smaller = Standard) |
| S3 Standard-IA | ~$0.0125 | ~$0.01/GB | 30 days | 128 KB billed minimum |
| S3 One Zone-IA | ~$0.01 | ~$0.01/GB | 30 days | 128 KB billed minimum |
| S3 Glacier Instant | ~$0.004 | ~$0.03/GB | 90 days | 128 KB billed minimum |
| S3 Glacier Flexible | ~$0.0036 | ~$0.01β0.03/GB | 90 days | 40 KB billed minimum |
| S3 Glacier Deep Archive | ~$0.00099 | ~$0.02/GB | 180 days | 40 KB billed minimum |
π Minimum storage duration traps are real. If you store a 1 GB file in Standard-IA for only 10 days and delete it, you are still billed for 30 days. Do not use IA classes for short-lived or frequently changed objects.
Intelligent-Tiering (INT) automatically moves objects between access tiers based on actual usage β no retrieval fees, no lifecycle rules to manage. It is the right choice when access patterns are unknown or unpredictable.
How It Works
- Objects start in the Frequent Access tier (same cost as Standard)
- Move to Infrequent Access tier after 30 days of no access
- Move to Archive Instant tier after 90 days (optional)
- Move to Archive tier after 90β180 days (optional, configure)
- Object accessed β immediately moved back to Frequent Access tier
Cost Considerations
- Small monitoring fee per object per month (~$0.0025/1,000 objects)
- Objects smaller than 128 KB are billed as Standard β not worth INT
- No retrieval fees within Frequent and Infrequent tiers
- Archive tiers have retrieval fees (like Glacier)
- No minimum storage duration β no early deletion penalty
Reduce Storage Cost
- Set lifecycle rules to transition to cheaper classes automatically
- Enable S3 Analytics to identify infrequently accessed data
- Use Intelligent-Tiering for data with unknown access patterns
- Expire old object versions automatically with lifecycle rules
- Abort incomplete multipart uploads (lifecycle rule after 7 days)
- Compress files before upload (GZIP, Snappy, ZSTD)
Reduce Transfer Cost
- Serve S3 content via CloudFront β S3βCF is free, CFβinternet is cheaper
- Keep compute (EC2/Lambda) in the same region as S3 β free transfer
- Use VPC Gateway Endpoints β free S3 access from within VPC
- Use S3 Select to transfer only needed rows, not full objects
- Enable Requester Pays for public datasets β consumer pays retrieval
S3 Storage Lens provides org-wide visibility into S3 usage, activity trends, and cost optimization recommendations across all buckets and accounts in your AWS Organization.
Usage Metrics
Total storage bytes, object count, average object size, incomplete multipart uploads β aggregated across your entire organization.
Activity Metrics
GET/PUT/DELETE request counts, bytes downloaded. Identify hot buckets and cold buckets that should be transitioned to cheaper storage classes.
Recommendations
S3 Storage Lens surfaces cost optimization tips: objects that qualify for lifecycle transitions, buckets with no lifecycle rules, and incomplete multipart upload accumulation.
| Scenario | Right Choice | Reason |
|---|---|---|
| Frequently accessed app data | S3 Standard | No retrieval fee, no min duration |
| Access pattern is unknown | S3 Intelligent-Tiering | Auto-optimizes without lifecycle rules |
| Backup accessed once/month | S3 Standard-IA | 50% cheaper storage, low retrieval frequency |
| Replicated data (can re-create) | S3 One Zone-IA | 20% cheaper than Standard-IA, acceptable single-AZ risk |
| Compliance archive, instant access | S3 Glacier Instant | ~83% cheaper than Standard, ms retrieval |
| 7+ year regulatory archive | S3 Glacier Deep Archive | ~96% cheaper than Standard, 12h retrieval acceptable |
| Short-lived temp files (<30 days) | S3 Standard | IA min-duration billing makes IA more expensive |
S3 cost optimization is primarily about storage class selection and lifecycle automation. Serve via CloudFront to eliminate egress. Set lifecycle rules on day one β retroactively optimizing storage is expensive and slow. Use Storage Lens to find what you missed.
- Four cost dimensions: storage ($/GB/month), requests (per 1,000), retrieval ($/GB for IA/Glacier), data transfer (free inbound, ~$0.09/GB egress).
- S3βCloudFront is free. Use CloudFront for public content β eliminates S3 egress cost entirely.
- Minimum duration traps: Standard-IA = 30 days, Glacier = 90 days, Deep Archive = 180 days. Don't use IA for short-lived objects.
- Intelligent-Tiering: auto-moves objects based on actual access. No retrieval fee. Best for unknown access patterns. Objects <128 KB billed as Standard.
- Lifecycle rules: set on day one. Expire old versions. Abort incomplete multipart uploads. Transition logs to Glacier after 90 days.
- VPC Gateway Endpoint: free S3 access from within a VPC. Eliminates NAT Gateway data processing costs for S3 traffic.
- Storage Lens: org-wide dashboard for usage, activity, and automatic cost optimization recommendations.
Architecture Patterns
S3 can serve HTML, CSS, JavaScript, and image files directly as a website β no web server, no EC2, no maintenance. For read-heavy static content, this is the simplest and cheapest architecture on AWS.
Architecture
- Enable static website hosting on the S3 bucket
- Set index document (
index.html) and error document (404.html) - Bucket policy grants
s3:GetObjectto*(public read) - Disable Block Public Access to allow the public policy
- Use custom domain via Route 53 CNAME or alias
With CloudFront (Recommended)
- CloudFront distribution in front of S3 origin
- S3 bucket stays private β CloudFront uses OAC to access it
- HTTPS via ACM certificate on CloudFront (S3 website endpoint is HTTP only)
- Global edge caching β serves from PoP nearest to user
- Eliminates S3 egress cost β S3βCF transfer is free
Users upload files directly to S3, bypassing your application server entirely. Your backend generates a short-lived presigned URL and returns it to the client. The client uploads directly to S3 β your server never touches the bytes.
Flow
- Client requests upload permission from your API
- Your API generates a presigned PUT URL (e.g., 15 minutes)
- API returns the presigned URL to the client
- Client uploads the file directly to S3 using the URL
- S3 sends an event notification to Lambda on completion
- Lambda processes the uploaded file (resize, scan, index)
Benefits
- Your servers handle zero upload bandwidth
- Files go directly to S3 β faster for users on high-bandwidth connections
- Bucket stays private β presigned URL grants temporary access only
- Lambda event trigger enables automatic downstream processing
- Scales to thousands of concurrent uploads without bottleneck
S3 is the standard storage layer for data lakes on AWS. Raw data lands in S3, is catalogued with Glue, and queried in-place with Athena β no database to provision, no ETL until you need it.
Architecture Layers
- Landing zone: raw data as-is β JSON, CSV, logs, API dumps
- Processed zone: cleaned, partitioned Parquet files (columnar format)
- Curated zone: aggregated, business-ready datasets
- Each zone is a separate S3 prefix or bucket
- AWS Glue Crawlers auto-discover schema and update the Glue Catalog
- Athena queries directly against Parquet files using SQL
Why This Pattern Works
- Storage is decoupled from compute β scale each independently
- Pay per query with Athena β no always-on database cluster
- Parquet columnar format reduces Athena scan cost by 10β100Γ
- Partition by date/region β Athena skips irrelevant partitions entirely
- Lake Formation adds fine-grained table/column access control
Database Backups
- RDS automated backups export to S3
- DynamoDB exports to S3 (point-in-time)
- EC2 snapshots stored via EBS then exported to S3
- Lifecycle: transition to Glacier after 30 days
Cross-Region DR
- Enable CRR to a secondary region bucket
- RPO: near-zero (async replication, seconds lag)
- RTO: immediate β data already in secondary region
- S3 Object Lock protects against ransomware
Versioning for Recovery
- Versioning = built-in point-in-time recovery
- Restore any object to any previous state
- Lifecycle rules expire old versions to control cost
- MFA Delete for extra protection on versioned buckets
S3 events drive serverless processing pipelines β no polling, no scheduler, no idle workers. Every object upload automatically triggers the next stage of processing.
| Mistake | Why It's Bad | Fix |
|---|---|---|
| Making bucket public for all content | Exposes all objects β including future uploads | Keep bucket private, use CloudFront OAC + presigned URLs |
| No lifecycle rules | Storage cost grows unbounded over months/years | Set lifecycle rules on day one for every bucket |
| Using S3 as a database | No query capability, no indexing β extremely slow lookups | Store metadata in DynamoDB/RDS, store files in S3 |
| Ignoring incomplete multipart uploads | Parts accumulate silently and are billed indefinitely | Lifecycle rule: abort incomplete multipart after 7 days |
| Moving to IA too aggressively | Min-duration billing + retrieval fees make it more expensive for frequent access | Use S3 Analytics or Intelligent-Tiering to identify true access patterns |
| Not enabling versioning on important buckets | One accidental delete or overwrite = permanent data loss | Enable versioning + lifecycle expire old versions |
| Storing credentials in S3 objects | Exposed if bucket is ever misconfigured | Use Secrets Manager or SSM Parameter Store |
S3's patterns all follow one principle: S3 is storage, not compute. Let CloudFront serve it, let Lambda process it, let Athena query it, let your API control access to it. S3 itself just stores β everything else is glue.
- Static website: S3 + CloudFront + ACM + Route 53. Bucket stays private. OAC grants CloudFront access. Zero server cost.
- User uploads: API generates presigned PUT URL β client uploads directly to S3 β Lambda processes on event. Your server handles zero bytes.
- Data lake: Landing (raw) β Processed (Parquet) β Curated zones in S3. Glue Catalog autodiscovers schema. Athena queries in-place.
- Backup & DR: CRR to secondary region. Versioning for point-in-time recovery. Object Lock for ransomware protection.
- Event-driven pipeline: ObjectCreated β Lambda β processed S3 β SQS β downstream. No polling, scales to any volume.
- Common mistakes: no lifecycle rules, ignoring incomplete multipart uploads, moving data to IA too aggressively, no versioning on critical buckets.