Amazon EFS โ
Elastic File System
Fully managed, elastic, shared NFS file system โ mount it across hundreds of EC2 instances, Lambda functions, and containers simultaneously, spanning multiple Availability Zones.
โก EFS in 30 Seconds
- Shared file storage โ multiple EC2 instances, Lambda functions, and containers mount the same filesystem simultaneously
- Elastic capacity โ grows and shrinks automatically, no pre-provisioning required
- Multi-AZ by default โ data replicated across โฅ3 Availability Zones for high durability
- NFS v4.1 protocol โ standard Linux filesystem interface, works like any mounted drive
- Pay per GB stored โ no upfront capacity planning, no minimum fees
What is EFS
Amazon EFS (Elastic File System) is AWS's fully managed, elastic, shared file storage service for Linux workloads. It provides a standard NFS (Network File System) interface, so you mount it on EC2 instances exactly like a local directory โ mount -t nfs4 and you're done. Unlike EBS (which is a single-instance block device), EFS can be mounted by thousands of compute instances simultaneously across multiple Availability Zones.
๐ Think of EFS as: A shared network drive in the cloud โ mount it everywhere, pay only for what you store, and it grows automatically
EFS was launched in 2016 to solve a fundamental gap in AWS storage: the need for a shared, POSIX-compliant filesystem that multiple EC2 instances could read and write concurrently. Before EFS, teams either jury-rigged NFS on EC2, used third-party solutions, or restructured applications to avoid shared storage entirely.
Without EFS โ The Problems
- EBS volumes attach to one EC2 instance โ no shared access
- Self-managed NFS servers โ you handle HA, patching, scaling
- S3 is object storage โ no POSIX filesystem, can't
cdorlsnatively - Scaling disk means stopping, resizing, restarting โ minutes of downtime
- Cross-AZ data sharing requires complex replication scripts
EFS Solves
- Shared mount point โ thousands of instances mount the same filesystem
- Fully managed โ AWS handles replication, patching, hardware
- POSIX-compliant โ
ls,cat,chmod,mvall work natively - Elastic โ grows and shrinks automatically with your data
- Multi-AZ by default โ data available across the entire region
EFS fills the file storage slot in the AWS storage trio. Understanding the difference is critical for the exam and for architecture decisions:
| Type | How It Works | AWS Service | Best For |
|---|---|---|---|
| File Storage | Shared filesystem with directories and files. NFS protocol. Multiple instances mount it simultaneously. | EFS | Shared config, CMS uploads, ML training data, container storage |
| Block Storage | Raw disk blocks. OS mounts it like a hard drive. Single-instance attachment (mostly). | EBS | Databases, boot volumes, OS-level read/write |
| Object Storage | Flat namespace โ key โ object. No folders. Access via HTTP API. | S3 | Files, backups, data lakes, static assets, logs |
๐ Exam rule of thumb: If the question says "shared filesystem" or "multiple EC2 instances need to access the same files" โ the answer is EFS. If it says "single instance, high IOPS" โ think EBS. If it says "unlimited, HTTP-accessed storage" โ think S3.
EFS integrates with core AWS compute and container services, acting as the shared data layer:
EC2 Instances
Mount EFS on multiple EC2 instances across AZs. Classic use case: web servers sharing uploaded content, config files, or CMS assets.
AWS Lambda
Lambda functions mount EFS to read/write files larger than the 512 MB /tmp limit. Essential for ML inference, large file processing.
ECS & Fargate
Containers mount EFS as persistent storage. Solves the ephemeral container storage problem โ data persists across task restarts.
EKS (Kubernetes)
EFS CSI driver provides ReadWriteMany PersistentVolumes. Multiple pods across nodes share the same filesystem โ perfect for shared workloads.
AWS DataSync
Migrate data from on-premises NFS/SMB to EFS at high speed. Automate ongoing replication for hybrid architectures.
AWS Backup
Automated EFS backups with retention policies, cross-region copy, and compliance controls โ no custom scripts needed.
Think of EFS like a shared office network drive โ everyone in the building plugs in and sees the same folders and files:
Network Drive = EFS File System
- A single shared filesystem, accessible by everyone
- Standard directories and files โ
/data,/config,/uploads - Grows automatically โ no one calls IT to "add more disk"
- Replicated across the building's floors (AZs) automatically
- You control who can access what with permissions and policies
Computers = EC2 / Lambda / Containers
- Each computer mounts the network drive to a local path
- All see the same data โ changes by one are immediately visible to others
- Computers can be on different floors (AZs) โ access is the same
- New computers join instantly โ no data copy needed
- If one computer shuts down, others are unaffected
This is the most common exam question around EFS. Here's the definitive comparison:
| Feature | EFS | EBS |
|---|---|---|
| Type | File storage (NFS) | Block storage (disk) |
| Access | Multiple instances simultaneously | Single instance (multi-attach only for io1/io2) |
| Protocol | NFS v4.1 | Block device (ext4, xfs) |
| Capacity | Elastic โ grows/shrinks automatically | Fixed โ you choose size upfront, can expand manually |
| AZ Scope | Multi-AZ (Regional) by default | Single AZ โ locked to one AZ |
| Pricing | ~$0.30/GB/month (Standard), pay per GB used | ~$0.10/GB/month (gp3), pay per GB provisioned |
| Performance | Good throughput, higher latency than EBS | Low latency, high IOPS (up to 256K IOPS for io2) |
| OS Support | Linux only | Linux and Windows |
| Best For | Shared content, CMS, ML data, container volumes | Databases, boot volumes, single-instance workloads |
๐ Key distinction: EFS is ~3ร more expensive per GB than EBS, but you never pay for unused capacity. For a shared workload across multiple instances, EFS is cheaper than running multiple EBS volumes with data synchronization. EFS = shared. EBS = dedicated.
If the exam mentions Windows or SMB โ the answer is never EFS. Know this comparison:
| Feature | EFS | FSx for Windows File Server |
|---|---|---|
| Protocol | NFS v4.1 | SMB 2.0โ3.1.1 |
| OS Support | Linux only | Windows + Linux (via SMB) |
| Active Directory | No native AD integration | Native AD / self-managed AD |
| DFS Namespaces | No | Yes โ Windows DFS support |
| Capacity | Elastic (pay per GB used) | Provisioned (choose size upfront) |
| Multi-AZ | Yes (default) | Optional (Single-AZ or Multi-AZ) |
| Best For | Linux apps, containers, Lambda | Windows file shares, .NET apps, SQL Server |
๐ Exam rule: "Shared filesystem for Windows" โ FSx for Windows. "Shared filesystem for Linux" โ EFS. "High-performance Linux HPC" โ FSx for Lustre. Never mix these up.
| Property | Value | Why It Matters |
|---|---|---|
| Protocol | NFS v4.0 / v4.1 | Standard Linux mount โ no proprietary client needed |
| OS Support | Linux only | Windows โ use FSx for Windows File Server instead |
| Capacity | Elastic (petabyte-scale) | No pre-provisioning. Grows and shrinks with data. |
| Durability | 99.999999999% (11 nines) | Data replicated across โฅ3 AZs automatically |
| Availability | 99.99% (Standard) / 99.9% (One Zone) | Standard = multi-AZ. One Zone = single AZ, 47% cheaper |
| Max File Size | 52 TB per file | No multi-part uploads โ just write the file normally |
| Concurrent Mounts | Thousands | All compute in your VPC can mount the same FS |
| Encryption | At rest (KMS) + in transit (TLS) | Enable at creation; cannot add at-rest encryption later |
| Use Case | How EFS Is Used | Why Not EBS/S3 |
|---|---|---|
| Web Server Farm | Multiple EC2 behind ALB mount EFS for shared WordPress uploads, themes, plugins | EBS = each instance gets its own copy. S3 = not a filesystem. |
| Container Persistent Storage | ECS/Fargate tasks mount EFS volumes to persist data across restarts | Container local storage is ephemeral โ dies with the task. |
| ML Training Data | Training data in EFS mounted by multiple SageMaker or EC2 training instances | All instances need concurrent read access to the same dataset. |
| Lambda Large Files | Lambda mounts EFS for models, libraries, or data > 512 MB limit | Lambda /tmp is only 512 MB. S3 requires download time. |
| CI/CD Shared Workspace | Build agents share compiled artifacts via mounted EFS | Faster than S3 for repeated small-file reads during builds. |
| Home Directories | User home dirs on EFS, accessible from any instance they log into | Like a traditional NAS โ user's files follow them. |
EFS is a shared, elastic NFS filesystem โ the answer whenever multiple compute resources need to read/write the same files simultaneously. It's Linux-only, multi-AZ by default, and you pay only for what you store.
- File storage โ POSIX-compliant NFS v4.1 filesystem. Mount it like a local directory on Linux.
- Shared access โ thousands of EC2, Lambda, ECS, EKS instances mount the same filesystem simultaneously.
- Elastic capacity โ no pre-provisioning. Grows/shrinks automatically. Pay per GB stored.
- Multi-AZ by default โ data replicated across โฅ3 AZs. 11 nines durability. 99.99% availability.
- Linux only โ for Windows shared storage, use FSx for Windows File Server.
- EFS vs EBS: EFS = shared, multi-AZ, elastic, NFS. EBS = dedicated, single-AZ, fixed-size, block.
- Integrates with: EC2, Lambda, ECS, Fargate, EKS, DataSync, AWS Backup.
- Key use cases: shared web content, container persistence, ML training data, Lambda large files.
Core Concepts
An EFS file system is the top-level resource โ the "drive" you create once and mount everywhere. Every EFS file system gets a unique ID like fs-0a1b2c3d4e5f and a DNS name like fs-0a1b2c3d4e5f.efs.us-east-1.amazonaws.com. You never manage disks, partitions, or RAID arrays โ AWS handles all of that behind the scenes.
File System Properties
- ID:
fs-xxxxxxxxxโ unique identifier - DNS name: region-specific, used in mount commands
- Creation token: idempotency key to prevent duplicates
- Lifecycle: exists until you explicitly delete it
- Tags: key-value pairs for billing, organization
Immutable At Creation
- Encryption at rest โ must enable at creation, cannot be added later
- Performance mode โ General Purpose or Max I/O, cannot change later
- Availability โ Regional (multi-AZ) or One Zone, cannot change later
- Throughput mode, lifecycle policies, access points โ can be changed anytime
๐ Exam trap: "Can you enable encryption on an existing EFS?" โ No. You must create a new encrypted file system and migrate data. This is a frequent exam question.
EFS offers multiple storage classes to optimize cost based on access frequency โ similar in concept to S3's storage classes, but applied at the file level, not the object level:
| Storage Class | Description | Cost (us-east-1) | Best For |
|---|---|---|---|
| EFS Standard | Multi-AZ. Frequently accessed data. Lowest latency. | ~$0.30/GB/month | Active application data, config files, CMS uploads |
| EFS Standard-IA | Multi-AZ. Infrequently accessed. Lower storage cost, per-access fee. | ~$0.025/GB/month + $0.01/GB read | Audit logs, old reports, seasonal data |
| EFS One Zone | Single AZ. Frequently accessed. ~47% cheaper than Standard. | ~$0.16/GB/month | Dev/test, scratch data, easily reproducible files |
| EFS One Zone-IA | Single AZ. Infrequently accessed. Cheapest option. | ~$0.0133/GB/month + $0.01/GB read | Dev logs, temporary backups, non-critical archives |
| EFS Archive | Multi-AZ. Rarely accessed (few times/year). Lowest storage cost. | ~$0.008/GB/month + $0.03/GB read | Compliance archives, historical data accessed yearly |
Regional (Standard) โ Use When
- Production workloads requiring high availability
- Multi-AZ EC2 deployments behind a load balancer
- Data that cannot be recreated if an AZ fails
- Compliance requirements mandate multi-AZ storage
One Zone โ Use When
- Development, testing, staging environments
- Data can be regenerated from source (e.g., build artifacts)
- Cost is the primary concern, not resilience
- All compute is in a single AZ anyway
๐ One Zone durability: Data is still replicated across multiple devices within the single AZ (11 nines durability). The real risk is AZ failure โ if the entire AZ goes down, your data is inaccessible until the AZ recovers. For irreplaceable data, always use Standard (multi-AZ).
EFS can automatically move files between storage classes based on access patterns โ this is called EFS Intelligent-Tiering (lifecycle management). You set policies; EFS handles the rest:
Transition to IA / Archive
- Move files not accessed for N days (7, 14, 30, 60, 90, 180, 270, 365)
- Applies to Standard โ Standard-IA, or One Zone โ One Zone-IA
- Archive tier: files not accessed for 90โ365+ days
- Metadata stays in Standard โ only file data moves
- File is transparently accessible (just higher latency on first read)
Transition Back to Standard
- EFS can automatically move files back to Standard on access
- Enable "Transition into Standard" policy โ on first access
- Hot file gets promoted; rarely-accessed files stay in IA
- Combined with transition-to-IA, creates automatic tiering loop
๐ Cost savings: Enabling lifecycle management can reduce EFS costs by up to 92% for workloads with a mix of hot and cold data. For most workloads, enable both transition-to-IA (30 days) and transition-back-on-access.
| Transition | Minimum Days | Notes |
|---|---|---|
| Standard โ Standard-IA | 1 day (options: 1, 7, 14, 30, 60, 90, 180, 270, 365) | Use 1-day cautiously โ files accessed daily will thrash between tiers |
| Standard โ Archive | 90 days minimum | Compliance requirement โ cannot archive sooner |
| Standard-IA โ Archive | 45 days after transition to IA | Must be in IA for 45+ days before moving to Archive |
| IA/Archive โ Standard | On first access (immediate) | Enable "Transition into Standard" policy โ files auto-promote |
EFS pricing is often confusing โ here's a concrete example showing how lifecycle management saves money:
Without Lifecycle (All Standard)
- 1,000 GB all in Standard
- Storage: 1,000 GB ร $0.30 = $300/month
- No read fees, but paying full price for cold data
With Lifecycle (30-day IA transition)
- 200 GB active (Standard) + 800 GB cold (Standard-IA)
- Standard: 200 GB ร $0.30 = $60/mo
- IA storage: 800 GB ร $0.025 = $20/mo
- IA reads: ~10 GB/day ร 30 ร $0.01 = $3/mo
- Total: ~$83/month (72% savings)
๐ EFS Backup (AWS Backup) pricing: Backup storage costs ~$0.05/GB-month (incremental). This is separate from EFS storage billing. Cross-region backup copies add data transfer + destination storage charges. Always budget backups independently.
| Limit | Value | Notes |
|---|---|---|
| Max concurrent NFS connections per FS | 25,000 | Soft limit โ can be raised via AWS Support |
| Lambda concurrent connections per FS | 25,000 | Same pool as EC2 โ shared limit across all clients |
| EC2 connection counting | 1 per instance | Each mount counts as one, regardless of processes reading/writing |
| One Zone connection limit | 25,000 | Same as Standard โ one mount target, same limit |
| Access Points per FS | 1,000 | Soft limit โ can request increase |
An EFS Access Point is an application-specific entry point into an EFS file system. Think of it as a "customized door" into the filesystem โ each access point can enforce a different root directory, user identity, and permissions:
Root Directory
Each access point can set a root path (e.g., /app1/data). The application only sees that subtree โ cannot navigate above it. Acts as a chroot.
POSIX User Identity
Override the connecting user's UID/GID. Force all access through this point to use uid=1000, gid=1000 regardless of the client's identity.
Permissions
Set directory creation permissions (e.g., 755) and owner (UID/GID) when the root directory is auto-created on first mount.
Access points are especially powerful with Lambda and ECS โ each function or container can get its own access point, isolating its view of the filesystem without complex IAM or NFS permissions.
A mount target is the network endpoint that EC2 instances use to connect to EFS. You create one mount target per Availability Zone in your VPC. Each mount target gets:
What a Mount Target Is
- An ENI (Elastic Network Interface) in your VPC subnet
- Gets a private IP address (e.g.,
10.0.1.15) - Gets a DNS name that resolves to the IP in that AZ
- Has a security group to control NFS traffic (port 2049)
- One per AZ for Regional FS, one total for One Zone FS
How Mounting Works
- EC2 uses the EFS DNS name โ resolves to the mount target in its AZ
sudo mount -t efs fs-0a1b2c3d:/ /mnt/efs(using amazon-efs-utils)- Or standard NFS:
mount -t nfs4 -o nfsvers=4.1 fs-dns:/ /mnt/efs - Traffic stays within the AZ โ no cross-AZ data transfer charges for NFS
- If an AZ's mount target is down, instances in that AZ lose access (others unaffected)
๐ Best practice: Always create mount targets in every AZ where you have compute resources. Use the amazon-efs-utils package for easier mounting with TLS encryption and IAM authorization built-in.
EFS Replication creates an automatic, continuous copy of your file system in another AWS region or another AZ configuration. Key facts:
How Replication Works
- Creates a read-only replica in the destination region/AZ
- Most changes replicated within 15 minutes (RPO โค 15 min)
- Uses AWS backbone network โ no VPN or peering needed
- Same storage classes, encryption, and lifecycle policies apply
- One replication configuration per file system
Use Cases
- Disaster recovery: failover to replica in another region
- Data locality: replica close to users in another region
- Promote replica to read-write during failover
- Cross-region compliance requirements
- No additional cost for replication transfer โ pay for destination storage
A file system policy is a JSON resource-based policy (like S3 bucket policies) that applies to every connection to the EFS file system. Common uses:
| Policy Action | What It Does | When to Use |
|---|---|---|
| Enforce in-transit encryption | Deny any NFS client that doesn't use TLS | Compliance โ security standard requires encrypted transport |
| Enforce IAM authorization | Require IAM identity-based policies for all NFS access | Zero-trust โ go beyond security groups + NFS perms |
| Prevent root access | Deny root user from mounting (UID 0 blocked) | Multi-tenant โ prevent privileged container escape |
| Enforce read-only | Allow mounts but deny all write operations | Shared config/reference data that should never be modified |
| Restrict to specific VPCs | Only allow access from specific VPCs via conditions | Cross-account access with guardrails |
EFS core concepts: File System (the drive), Mount Targets (the network plugs โ one per AZ), Storage Classes (Standard, IA, One Zone, Archive), Access Points (isolated app-specific entries), and Lifecycle Policies (auto-tier files to save up to 92% cost).
- File System: top-level resource with unique ID and DNS name. Encryption and performance mode are immutable after creation.
- Storage Classes: Standard, Standard-IA, One Zone, One Zone-IA, Archive โ from ~$0.30 down to ~$0.008/GB/month.
- Lifecycle Management: auto-move files to IA/Archive after N days of no access. Can also auto-promote back on access. Up to 92% savings.
- Access Points: application-specific entry points with enforced root directory, UID/GID override, and auto-created permissions.
- Mount Targets: ENI per AZ with private IP, DNS name, and security group. NFS port 2049. Use
amazon-efs-utilsfor easy mounting. - Replication: continuous cross-region replica with RPO โค 15 minutes. Read-only destination, promotable for DR.
- File System Policy: resource-based JSON policy to enforce encryption in transit, IAM auth, read-only, or block root access.
Performance Modes
When you create an EFS file system, you choose a performance mode. This setting is permanent โ you cannot change it after creation. It controls how the file system handles I/O operations, specifically the trade-off between latency and total throughput capacity.
๐ Exam rule: Performance mode is immutable. If you chose wrong, you must create a new file system and migrate data. Choose carefully at creation time.
General Purpose is the default and recommended mode for the vast majority of workloads. It provides the lowest latency per I/O operation and is suitable for latency-sensitive applications.
Characteristics
- Lowest per-operation latency โ single-digit milliseconds
- Up to 35,000 read IOPS and 7,000 write IOPS
- Default mode โ use unless you have a specific reason not to
- CloudWatch metric:
PercentIOLimitshows how close you are to the IOPS ceiling - Supports both Regional and One Zone availability
Best For
- Web serving โ WordPress, Drupal, CMS platforms
- Content management โ shared uploads, media files
- Home directories โ user files across instances
- Development environments โ code repos, build artifacts
- General application data โ config, logs, session state
๐ Monitoring tip: Watch the PercentIOLimit CloudWatch metric. If it consistently hits 100%, your workload may benefit from Max I/O. But try Elastic Throughput first โ it's usually sufficient.
Max I/O mode removes the IOPS ceiling, allowing virtually unlimited parallel I/O operations. The trade-off: slightly higher per-operation latency (tens of milliseconds instead of single-digit).
Characteristics
- No IOPS limit โ scales to hundreds of thousands of operations
- Higher per-operation latency โ tens of milliseconds (not single-digit)
- Designed for highly parallelized workloads with many concurrent clients
- No
PercentIOLimitmetric โ there is no limit to hit - Cannot be changed back to General Purpose after creation
Best For
- Big data analytics โ hundreds of instances reading concurrently
- Media processing โ video transcoding across many workers
- Machine learning โ large training data read by many GPU instances
- Genomics workflows โ massively parallel file reads
- Any workload where
PercentIOLimitconsistently hits 100%
| Feature | General Purpose (default) | Max I/O |
|---|---|---|
| Latency | Single-digit milliseconds (lowest) | Tens of milliseconds (slightly higher) |
| IOPS | Up to 35K read / 7K write | Effectively unlimited |
| Parallelism | Good for moderate concurrency | Optimized for massive parallelism (hundreds of clients) |
| PercentIOLimit | CloudWatch metric available โ monitor it | Not applicable โ no ceiling |
| Use Case | Web, CMS, containers, Lambda, home dirs | Big data, ML training, media processing, genomics |
| Changeable? | No โ immutable after creation. Create new FS to switch. | |
AWS introduced Elastic Throughput as an enhancement for General Purpose mode that dynamically scales throughput based on workload demands โ without requiring Max I/O. This has made Max I/O unnecessary for most workloads.
How Elastic Throughput Works
- Automatically scales read throughput up to 10 GiB/s
- Write throughput up to 3 GiB/s
- No capacity planning โ spiky workloads handled automatically
- Pay only for throughput used beyond baseline
- Works with General Purpose mode only
Before vs After Elastic Throughput
- Before: if you hit IOPS limits in General Purpose โ switch to Max I/O (accept higher latency)
- After: Elastic Throughput handles bursts in General Purpose โ Max I/O rarely needed
- Most workloads that previously required Max I/O now work fine with General Purpose + Elastic Throughput
๐ Current recommendation (2026): Start with General Purpose + Elastic Throughput. Only consider Max I/O if you have truly massive parallelism (500+ concurrent clients doing heavy I/O). Elastic Throughput handles most "burst" scenarios.
| Scenario | Answer | Why |
|---|---|---|
| "WordPress farm with 10 EC2 instances sharing uploads" | General Purpose | Low latency needed for web requests. 10 instances = far below IOPS ceiling. |
| "500 compute instances processing genomics data in parallel" | Max I/O | Massive parallelism. Latency tolerance is acceptable. Need unlimited IOPS. |
| "Lambda functions reading ML models from shared storage" | General Purpose | Lambda cold starts already add latency โ need fast I/O per request. PercentIOLimit unlikely to be reached. |
| "EFS PercentIOLimit metric at 100% constantly" | Migrate to Max I/O (or enable Elastic Throughput) | Hitting the ceiling. Either switch mode or use Elastic Throughput to burst past the limit. |
| "Video rendering farm reading large files, latency not critical" | Max I/O | High parallelism, large sequential reads, higher latency acceptable for batch processing. |
| "Can I switch from General Purpose to Max I/O?" | No | Performance mode is immutable. Must create a new file system and migrate. |
Start with General Purpose + Elastic Throughput โ it handles 90%+ of workloads. Only choose Max I/O for truly massive parallelism (500+ clients, batch analytics, genomics). Performance mode is immutable โ you cannot change it after creation.
- Two performance modes: General Purpose (default, low latency) and Max I/O (unlimited IOPS, higher latency).
- General Purpose: single-digit ms latency, up to 35K read / 7K write IOPS. Best for web, CMS, containers, Lambda.
- Max I/O: no IOPS ceiling, tens of ms latency. Best for big data, genomics, media processing with 500+ clients.
- Immutable: performance mode cannot be changed after creation. Must create new FS and migrate.
- Elastic Throughput: auto-scales to 10 GiB/s read / 3 GiB/s write in General Purpose mode. Made Max I/O rarely needed.
- Monitor:
PercentIOLimitCloudWatch metric (General Purpose only). If at 100%, consider Elastic Throughput or Max I/O. - Default choice: General Purpose + Elastic Throughput for 90%+ of workloads.
Throughput Modes
Students often confuse these two settings. They control different things:
Performance Mode (Ch. 3)
- Controls IOPS โ how many I/O operations per second
- Controls latency โ how fast each operation completes
- General Purpose vs Max I/O
- Immutable โ set at creation, cannot change
Throughput Mode (This Chapter)
- Controls throughput โ how many MB/s or GB/s of data transfer
- How fast you can read/write large amounts of data
- Bursting vs Provisioned vs Elastic
- Changeable โ can switch modes anytime
๐ Analogy: Performance mode = how many cars can enter the highway at once (IOPS). Throughput mode = the speed limit on the highway (MB/s). Both matter, but they're independent settings.
Bursting is the default throughput mode. Your throughput scales with how much data is stored in EFS โ the more data you store, the higher your baseline and burst throughput. It works like a token bucket:
How Bursting Works
- Baseline throughput: 50 KiB/s per GB of data stored in Standard class
- Burst throughput: up to 100 MiB/s (regardless of size)
- Minimum baseline: 1 MiB/s (even for tiny file systems)
- Burst credits accumulate when throughput is below baseline
- Credits consumed when bursting above baseline
Throughput by Storage Size
- 100 GB stored โ baseline 5 MiB/s, burst to 100 MiB/s
- 1 TB stored โ baseline 50 MiB/s, burst to 100 MiB/s
- 10 TB stored โ baseline 500 MiB/s (no burst needed โ baseline exceeds burst cap)
- Credit balance visible in CloudWatch:
BurstCreditBalance
๐ The problem with Bursting: If your file system is small (e.g., 50 GB of config files) but your workload is throughput-heavy (reads/writes many MB/s), you'll burn through burst credits and get throttled to a tiny baseline. This is the #1 EFS performance complaint.
| Burst Credit Parameter | Value | Example |
|---|---|---|
| Baseline throughput | 50 KiB/s per GB stored (Standard) | 100 GB FS โ 5 MiB/s baseline |
| Minimum baseline | 1 MiB/s (even for tiny FS) | 1 GB FS still gets 1 MiB/s baseline |
| Maximum burst | 100 MiB/s | Cannot exceed regardless of credits available |
| Credit accumulation | At baseline rate when idle | 5 MiB/s ร 3600s = 18 GiB/hour credited |
| Credit consumption | Each MiB above baseline = 1 credit | Bursting at 100 MiB/s = 95 MiB/s credit burn (if 5 MiB/s baseline) |
| Throttle condition | Credits reach zero | Throughput drops to baseline (e.g., 5 MiB/s). Monitor BurstCreditBalance. |
Provisioned Throughput lets you specify exactly how much throughput you need, independent of storage size. You pay for what you provision.
Characteristics
- Set throughput from 1 MiB/s to 3,125 MiB/s
- Decouples throughput from storage size
- Pay for provisioned throughput + storage separately
- Can change provisioned value (increase/decrease) anytime
- If actual throughput exceeds provisioned โ still bill at provisioned rate, may throttle
Use When
- Small file system with high throughput needs (e.g., 20 GB, need 200 MiB/s)
- Predictable, steady throughput requirements
BurstCreditBalancekeeps hitting zero- You know your exact throughput requirements and want cost certainty
Elastic Throughput is the newest and recommended mode for most workloads. It automatically scales throughput up and down based on demand โ no planning, no burst credits, no provisioning.
Characteristics
- Automatically scales to up to 10 GiB/s read, 3 GiB/s write
- No burst credits to manage โ throughput instantly available
- Pay per GiB of data transferred (read: ~$0.03/GiB, write: ~$0.06/GiB)
- No baseline throughput limits based on storage size
- Works with General Purpose performance mode only
Use When
- Spiky, unpredictable workloads (e.g., CI/CD pipelines, batch processing)
- You don't want to manage burst credits or provision throughput
- Workloads that are idle most of the time but need high throughput in bursts
- Small file systems that need more throughput than Bursting allows
- Default recommendation for new file systems
| Feature | Bursting (default) | Provisioned | Elastic โญ |
|---|---|---|---|
| How It Works | Throughput scales with stored data. Burst credits when idle. | You specify exact MiB/s. Fixed cost. | Auto-scales on demand. Pay per GiB transferred. |
| Max Read | 100 MiB/s (burst) or 50 KiB/s ร GB | Up to 3,125 MiB/s | Up to 10 GiB/s |
| Max Write | 100 MiB/s (burst) | Up to 3,125 MiB/s | Up to 3 GiB/s |
| Pricing | Included in storage cost | Storage + throughput provisioned | Storage + per-GiB data transfer |
| Best For | Large FS with moderate throughput | Small FS, predictable high throughput | Spiky workloads, unpredictable patterns |
| Risk | Burst credit exhaustion โ throttled | Over-provisioning wastes money | Cost unpredictable if data transfer is very high |
| Changeable? | Yes โ you can switch between modes anytime | ||
CloudWatch metrics to monitor EFS throughput performance:
| Metric | What It Measures | When to Act |
|---|---|---|
BurstCreditBalance | Remaining burst credits (Bursting mode only) | Trending toward zero โ switch to Elastic or Provisioned |
TotalIOBytes | Total bytes read + written per period | Spikes indicate burst patterns โ consider Elastic |
MeteredIOBytes | Bytes billed for Elastic Throughput (data transfer) | Track cost โ if consistent, Provisioned may be cheaper |
PermittedThroughput | Max throughput allowed at this moment | If actual throughput equals permitted โ being throttled |
PercentIOLimit | How close to IOPS ceiling (General Purpose only) | Sustained 100% โ consider Max I/O or Elastic Throughput |
| Scenario | Answer | Why |
|---|---|---|
| "Small EFS (50 GB) keeps getting throttled, BurstCreditBalance at zero" | Switch to Elastic or Provisioned | 50 GB = only 2.5 MiB/s baseline. Once burst credits gone, throughput drops to baseline. |
| "10 TB file system, steady 200 MiB/s throughput" | Bursting works fine | 10 TB = 500 MiB/s baseline. Well above the 200 MiB/s need. No credits consumed. |
| "CI/CD builds spike throughput for 5 minutes then idle for hours" | Elastic Throughput | Spiky, unpredictable. Elastic charges only during the burst. Provisioned wastes money during idle. |
| "Constant 500 MiB/s throughput 24/7 for video processing" | Provisioned | Predictable, constant. Provisioned at 500 MiB/s is cheaper than Elastic's per-GiB charges at this volume. |
| "Can I change throughput mode later?" | Yes | Unlike performance mode, throughput mode can be switched between Bursting, Provisioned, and Elastic anytime. |
Throughput mode controls MB/s data transfer speed (not IOPS). Elastic is the default recommendation โ auto-scales, no credits to manage, pay per use. Choose Provisioned for constant high throughput, Bursting only for large file systems with modest I/O. Unlike performance mode, throughput mode can be changed anytime.
- Performance mode โ Throughput mode: Performance = IOPS/latency (immutable). Throughput = MB/s data transfer (changeable).
- Bursting (default): throughput scales with data stored. 50 KiB/s per GB baseline. Burst to 100 MiB/s. Risk: credit exhaustion for small FS.
- Provisioned: fixed MiB/s you specify. Decoupled from storage size. Best for constant, predictable workloads.
- Elastic โญ: auto-scales to 10 GiB/s read / 3 GiB/s write. Pay per GiB transferred. Best for spiky/unpredictable workloads. Recommended default.
- Changeable: you can switch between all three throughput modes anytime โ no migration needed.
- Monitor:
BurstCreditBalance(Bursting),MeteredIOBytes(Elastic cost),PermittedThroughput(throttling detection).
Networking & Mount Targets
To connect to an EFS file system, your compute resources need a mount target โ a network entry point inside your VPC. Without mount targets, EFS is unreachable. Every mount target is an Elastic Network Interface (ENI) with a private IP address in one of your VPC subnets.
Mount Target = ENI
- One mount target per AZ (for Regional file systems)
- One mount target total (for One Zone file systems โ in the chosen AZ)
- Gets a private IP from your subnet's CIDR
- Gets a DNS name:
az-id.fs-id.efs.region.amazonaws.com - Appears in EC2 โ Network Interfaces as a managed ENI
DNS Resolution
- EFS DNS name:
fs-id.efs.region.amazonaws.com - Resolves to the mount target IP in the caller's AZ
- EC2 in us-east-1a โ resolves to mount target in us-east-1a
- Requires VPC DNS resolution and DNS hostnames enabled
- If mount target missing in an AZ โ DNS resolution fails for instances in that AZ
๐ Common mistake: Forgetting to create a mount target in an AZ where EC2 instances run. The instance can't resolve the EFS DNS name โ mount fails with Connection timed out. Always create mount targets in every AZ with compute resources.
Security groups are the most important networking control for EFS. You need two security groups configured correctly:
Mount Target Security Group
- Attached to each mount target ENI
- Inbound rule: TCP port
2049(NFS) from client security groups - Source: reference the client SG (not CIDR โ more secure and dynamic)
- Outbound: default (allow all) is fine
- One SG can be shared by all mount targets
Client (EC2/Lambda/ECS) Security Group
- Attached to your compute instances
- Outbound rule: TCP port
2049to the mount target security group - If you reference SG IDs (not CIDRs), scaling is automatic โ new instances get access instantly
- No inbound rule needed for NFS (client initiates connection)
๐ Best practice: Always reference security group IDs instead of IP ranges. This way, any new EC2 instance added to the client SG automatically gets EFS access โ no rule updates needed.
There are two ways to mount EFS on EC2. The amazon-efs-utils helper is strongly recommended:
Using amazon-efs-utils (Recommended)
- Install:
sudo yum install -y amazon-efs-utils - Mount:
sudo mount -t efs fs-0a1b2c3d:/ /mnt/efs - With TLS:
sudo mount -t efs -o tls fs-0a1b2c3d:/ /mnt/efs - With IAM:
sudo mount -t efs -o tls,iam fs-0a1b2c3d:/ /mnt/efs - With Access Point:
sudo mount -t efs -o tls,accesspoint=fsap-xxxx fs-0a1b2c3d:/ /mnt/efs - Auto-reconnect, watchdog, logging built-in
Using Standard NFS Client
- Install:
sudo yum install -y nfs-utils - Mount:
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-dns:/ /mnt/efs - Works but: no TLS, no IAM, no auto-reconnect
- Must manage NFS options manually
- Use only if amazon-efs-utils is not available
For persistent mounts (survive reboot), add to /etc/fstab:
fs-0a1b2c3d:/ /mnt/efs efs _netdev,tls,iam 0 0
| Access Pattern | How to Configure | Cost Implications |
|---|---|---|
| Same VPC, Same AZ | DNS resolves to local mount target. No extra config. | No data transfer charges |
| Same VPC, Cross-AZ | DNS resolves to local AZ mount target โ traffic stays in-AZ | No cross-AZ charges for NFS (AWS absorbs it) |
| Cross-VPC (same region) | VPC Peering or Transit Gateway. Mount via mount target IP or DNS. | Standard VPC peering / TGW data transfer charges |
| Cross-Account | Share via Resource Access Manager (RAM) or IAM resource policy on EFS. Peer VPCs. | Peering charges apply. EFS access is free beyond transfer. |
| Cross-Region | Not directly possible. Use EFS Replication for a read-only replica in another region. | Replication storage cost. No direct NFS mount cross-region. |
| On-Premises | AWS Direct Connect or VPN + mount target IP. Or use DataSync for data transfer. | Direct Connect / VPN charges + data transfer out fees |
Lambda mounting EFS has specific networking requirements that differ from EC2:
Requirements
- Lambda function must be in a VPC (VPC-attached Lambda)
- Lambda's subnet must be in the same AZ as an EFS mount target
- Lambda's security group needs outbound TCP 2049
- Mount target SG needs inbound TCP 2049 from Lambda SG
- Lambda uses an Access Point (required โ cannot mount root directly)
Gotchas
- VPC Lambda has cold start penalty โ ENI creation adds 1-2 seconds
- Lambda needs NAT Gateway for internet access when in VPC
- EFS mount adds ~1 second to cold starts (connection setup)
- Max 25,000 concurrent connections per EFS file system for Lambda
- Lambda + EFS = VPC + Subnet + SG + Mount Target + Access Point โ lots to configure
| Error | Cause | Fix |
|---|---|---|
Connection timed out | Security group blocking port 2049; or no mount target in the instance's AZ | Check SG inbound rules. Verify mount target exists in the AZ. |
mount.nfs4: No such device | nfs-utils or amazon-efs-utils not installed | sudo yum install -y amazon-efs-utils |
Permission denied | IAM policy denying access; or file system policy blocking; or POSIX permissions wrong | Check IAM role, EFS resource policy, and file/dir permissions (chmod) |
Name resolution failed | VPC DNS resolution not enabled; or DNS hostnames disabled | Enable DNS resolution and DNS hostnames in VPC settings |
nfs: server not responding | Mount target ENI deleted or AZ outage | Verify mount target status in console. Failover to another AZ if needed. |
EFS networking = mount targets (one per AZ) + security groups (port 2049) + DNS resolution. Use amazon-efs-utils for TLS and IAM. Reference SG IDs (not CIDRs) for dynamic scaling. Lambda requires VPC attachment + Access Point.
- Mount targets: ENI per AZ with private IP. One per AZ for Regional FS, one total for One Zone. NFS port 2049.
- DNS resolution: EFS DNS resolves to mount target IP in caller's AZ. Requires VPC DNS resolution + DNS hostnames enabled.
- Security groups: mount target SG (inbound 2049 from client SGs) + client SG (outbound 2049 to mount target SG). Reference SG IDs, not CIDRs.
- Mounting: use
amazon-efs-utilsfor TLS, IAM, and auto-reconnect. Add to/etc/fstabfor persistence. - Cross-network: same VPC = free. Cross-VPC = peering/TGW. Cross-account = RAM + peering. Cross-region = replication only.
- Lambda: requires VPC, same-AZ subnet, access point, SG on port 2049. Adds cold start latency (~1-2s).
- Troubleshooting: most failures = SG misconfiguration or missing mount target in the AZ.
Security & Access Control
EFS security is a layered defense โ multiple independent controls that work together. Understand each layer and how they combine:
๐ All layers must allow access โ if any layer denies, the request fails. A request must pass: (1) security group allows port 2049, (2) IAM allows the EFS action, (3) file system policy allows the connection, (4) POSIX permissions allow the file operation.
Network is the first barrier. This was covered in Chapter 5 โ here's the security-specific summary:
| Control | What to Configure | Key Point |
|---|---|---|
| Security Groups | Mount target SG: inbound TCP 2049 from client SGs. Client SG: outbound TCP 2049 to mount target SG. | Reference SG IDs, not CIDRs. Most common misconfiguration. |
| Subnets | Place mount targets in private subnets only. | EFS should never be in a public subnet โ no internet-facing NFS. |
| NACLs | Ensure NACLs allow TCP 2049 and ephemeral ports (1024-65535) between subnets. | NACLs are stateless โ must allow both request and response ports. |
| VPC Endpoints | Not required โ EFS uses mount targets (ENIs), not VPC endpoints. | Unlike S3 (gateway endpoint), EFS access is always through mount targets inside the VPC. |
IAM controls who can perform EFS API actions and who can mount the file system. EFS supports two types of IAM integration:
API-Level IAM (Management)
- Controls AWS API actions:
elasticfilesystem:CreateFileSystem,:DeleteFileSystem,:DescribeFileSystems - Attached to IAM users/roles that manage EFS via Console/CLI/SDK
- Standard IAM policy โ same as any AWS service
- Does NOT control NFS data access (read/write files)
NFS-Level IAM (Data Access)
- Controls NFS mount and file operations:
elasticfilesystem:ClientMount,:ClientWrite,:ClientRootAccess - Requires mounting with
-o tls,iamflag - EC2 instance role / Lambda execution role must have these permissions
- Combined with file system policy for full zero-trust control
Key IAM actions for NFS data access:
| IAM Action | What It Controls | Notes |
|---|---|---|
elasticfilesystem:ClientMount | Permission to mount the file system (read-only) | Required for any mount. Without ClientWrite, mount is read-only. |
elasticfilesystem:ClientWrite | Permission to write data to the file system | Add this for read-write mounts. Omit for read-only access. |
elasticfilesystem:ClientRootAccess | Permission to access as root user (UID 0) | Deny this to prevent containers/instances from acting as root on the filesystem. |
๐ Exam note: IAM-based NFS access is optional โ by default, any EC2 instance that can reach the mount target (network layer) can mount and read/write. IAM adds an additional authorization layer. Enable it by mounting with -o tls,iam and by setting a file system policy that enforces IAM.
A file system policy is a JSON resource-based policy attached directly to the EFS file system โ similar to an S3 bucket policy. It applies to every NFS connection regardless of which client connects.
Common Policy Patterns
- Enforce encryption in transit: deny any connection without TLS
- Enforce IAM authorization: deny anonymous NFS clients
- Prevent root access: deny
ClientRootAccessfor all principals - Read-only access: deny
ClientWriteglobally - Restrict to specific accounts/roles: condition on
aws:PrincipalArn
How to Apply
- Console: EFS โ File System โ Edit โ File System Policy
- CLI:
aws efs put-file-system-policy - Preconfigured toggles in console for common patterns
- Can be set at any time โ does NOT require recreating the FS
- Takes effect immediately for new connections
Example file system policy that enforces TLS and IAM for all connections:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EnforceTLSAndIAM",
"Effect": "Deny",
"Principal": { "AWS": "*" },
"Action": "*",
"Resource": "arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-0a1b2c3d",
"Condition": {
"Bool": {
"aws:SecureTransport": "false" โ Deny if NOT using TLS
}
}
},
{
"Sid": "EnforceIAMAuth",
"Effect": "Deny",
"Principal": { "AWS": "*" },
"Action": "*",
"Resource": "arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-0a1b2c3d",
"Condition": {
"Bool": {
"elasticfilesystem:AccessedViaMountTarget": "true"
},
"StringNotEquals": {
"elasticfilesystem:AccessPointArn": "" โ Must use access point
}
}
}
]
} EFS is a POSIX-compliant filesystem โ standard Linux file permissions (rwxr-xr-x) and ownership (uid:gid) apply to every file and directory. This is the final layer of access control.
How It Works
- Every file/directory has an owner UID, group GID, and permission bits
chmod 755 /dataโ owner rwx, group r-x, others r-xchown 1000:1000 /dataโ set owner to UID 1000- NFS client connects with a UID/GID (from the EC2 instance's user)
- EFS checks if that UID/GID has permission for the requested operation
Access Points Override
- Access Points can override the connecting user's UID/GID
- All connections through the AP use the specified UID/GID
- Combined with root directory setting โ app sees only its subtree
- Simplifies permission management for Lambda and containers
- No need to manage Linux users across instances
Encryption at Rest
- Uses AWS KMS (managed key or customer-managed CMK)
- Must enable at creation time โ cannot add later
- Encrypts all data, metadata, and temporary files
- Transparent โ no performance impact, no code changes
- Default KMS key:
aws/elasticfilesystem - Customer-managed CMK: you control rotation, access policy, deletion
Encryption in Transit (TLS)
- TLS 1.2 encryption for NFS traffic between client and mount target
- Enable by mounting with
-o tls(requiresamazon-efs-utils) - Can be enabled/disabled anytime โ per-mount decision
- Enforced globally via file system policy (
aws:SecureTransport) - Slight CPU overhead on client side โ negligible for most workloads
๐ Exam question pattern: "How to encrypt an existing unencrypted EFS?" โ You cannot. Create a new encrypted file system, use DataSync or rsync to copy data, then switch mount targets. This is a common trap.
| # | Practice | How to Implement |
|---|---|---|
| 1 | Always encrypt at rest | Enable at creation. Use customer-managed CMK for compliance. |
| 2 | Always encrypt in transit | Mount with -o tls. Set FS policy to deny unencrypted connections. |
| 3 | Enable IAM authorization | Mount with -o tls,iam. Set FS policy to enforce IAM auth. |
| 4 | Use Access Points | One AP per application. Enforce root dir + UID/GID per app. |
| 5 | Deny root access | FS policy: deny ClientRootAccess. Prevents containers from running as root on EFS. |
| 6 | Private subnets only | Never place mount targets in public subnets. No internet-facing NFS. |
| 7 | Reference SG IDs in rules | SG-to-SG references. Auto-scales with fleet. No IP management. |
| 8 | Enable AWS Backup | Automated backups with retention policies. Cross-region copy for DR. |
| 9 | Enable CloudTrail logging | Log all EFS API calls. Audit who created/deleted/modified file systems. |
| 10 | Restrict with resource policies | FS policy to allow only specific accounts, roles, or VPCs. |
| Scenario | Answer | Why |
|---|---|---|
| "Compliance requires all EFS data encrypted at rest and in transit" | Create FS with encryption enabled + FS policy enforcing aws:SecureTransport | At-rest = creation time. In-transit = FS policy denies non-TLS connections. |
| "Prevent containers from writing as root to EFS" | FS policy: deny ClientRootAccess | Blocks UID 0 operations. Use Access Points to assign non-root UID/GID. |
| "Multiple Lambda functions need isolated directories on same EFS" | Create one Access Point per Lambda with different root dirs and UID/GID | Each AP enforces isolation. Lambda sees only its subtree. |
| "EC2 can ping mount target IP but mount times out" | Security group missing inbound TCP 2049 rule | ICMP (ping) and TCP 2049 (NFS) are separate rules. SG must explicitly allow NFS. |
| "How to share EFS across two AWS accounts?" | VPC peering + FS policy allowing the other account's principal | Network connectivity (peering) + authorization (FS policy with cross-account principal). |
EFS security is four layers deep: Network (SG port 2049) โ IAM (ClientMount/ClientWrite/ClientRootAccess) โ File System Policy (enforce TLS, IAM, deny root) โ POSIX Permissions (chmod/chown). All layers must allow access. Encryption at rest is immutable โ enable it at creation.
- Four security layers: Network โ IAM โ File System Policy โ POSIX permissions. All must allow access.
- Network: SG on port 2049. Mount targets in private subnets. Reference SG IDs, not CIDRs.
- IAM:
ClientMount,ClientWrite,ClientRootAccess. Requires-o tls,iammount option. - File System Policy: resource-based JSON policy. Enforce TLS, IAM auth, block root, read-only. Changeable anytime.
- POSIX: standard Linux rwx permissions and UID/GID ownership. Access Points override connecting user identity.
- Encryption at rest: KMS-based. Immutable โ must enable at creation. Cannot add to existing FS.
- Encryption in transit: TLS 1.2 via
-o tls. Enforceable via FS policy. Can be enabled/disabled per mount.
Architecture Patterns
The classic EFS use case: multiple EC2 web servers behind an Application Load Balancer, all sharing the same uploaded content, themes, and configuration files via EFS.
Why This Works
- All EC2 instances see identical
/wp-contentโ uploads, themes, plugins shared - Auto Scaling Group can add/remove instances โ new ones mount EFS instantly
- Multi-AZ ALB + Multi-AZ EFS + Multi-AZ RDS = fully resilient
- No data sync scripts, no S3 plugins, no shared NFS servers to manage
Configuration
- Performance mode: General Purpose
- Throughput mode: Elastic (handles traffic spikes)
- Storage class: Standard + IA lifecycle (30-day transition)
- Mount in user data:
mount -t efs -o tls fs-id:/ /var/www/wp-content
Lambda functions mount EFS to process files too large for the 512 MB /tmp limit โ ML model inference, PDF generation, video thumbnail extraction.
Benefits
- ML model (2 GB) shared across all Lambda invocations
- No S3 download on each invocation โ EFS is already mounted
- Update model on EFS โ all Lambdas get new version instantly
Config
- Lambda in VPC with EFS access point
- Elastic throughput for burst reads
- Access Point:
/ml, UID 1000
Trade-offs
- VPC Lambda cold starts (~1-2s extra)
- 25K connection limit per EFS FS
- Needs NAT Gateway for internet
Containers are ephemeral โ when a task stops, its local storage is lost. EFS provides persistent, shared storage that survives container restarts and can be shared across tasks.
๐ ECS Task Definition config: Set "volumes" โ "efsVolumeConfiguration" with file system ID, access point ID, and "transitEncryption": "ENABLED". Mount into containers via "mountPoints".
Architecture
- Primary region: us-east-1 โ active EFS with read/write access
- DR region: us-west-2 โ read-only replica, RPO โค 15 min
- Continuous replication over AWS backbone (no VPN needed)
- On failure: promote DR replica to read/write
- Update DNS (Route 53) to point compute to DR region
Failover Steps
- 1. Detect primary region failure (CloudWatch alarm / manual)
- 2. Delete replication configuration on DR file system
- 3. DR file system becomes read/write
- 4. Update mount targets / DNS in DR region
- 5. Start DR compute resources (EC2, Lambda, ECS)
- 6. After recovery: set up replication in reverse direction
Architecture
- Training data stored in EFS (datasets, preprocessed features)
- Multiple GPU EC2 instances mount EFS concurrently
- Each instance reads different data shards from the same filesystem
- Model checkpoints written to shared EFS โ any instance can resume
- Performance mode: Max I/O (massive parallel reads)
- Throughput mode: Elastic (burst during training, idle between runs)
Why EFS Over S3
- S3 requires downloading datasets to local disk โ startup delay
- EFS is already mounted โ training starts immediately
- Random read access to files (EFS) vs sequential download (S3)
- Checkpoint save:
torch.save(model, "/mnt/efs/checkpoints/epoch_5.pt") - Other instances immediately see the checkpoint โ failover is instant
Architecture
- On-premises NFS โ AWS DataSync โ EFS
- Migrate existing file shares to cloud incrementally
- Direct Connect or VPN for on-premises NFS mount
- AWS-side processing (Lambda, EC2) accesses data via EFS
- Bi-directional sync for hybrid workflows
Migration Path
- Phase 1: DataSync copies data from on-prem NFS to EFS (initial sync)
- Phase 2: Incremental syncs (deltas only) on schedule
- Phase 3: Cut over โ point apps to EFS, decommission on-prem NFS
- DataSync handles permissions, timestamps, symlinks
- Transfer speeds up to 10 Gbps over Direct Connect
| Requirement | Best Service | Why |
|---|---|---|
| Shared Linux filesystem, multi-AZ | EFS | NFS v4.1, elastic, multi-AZ, managed |
| Shared Windows filesystem (SMB) | FSx for Windows | SMB protocol, Active Directory integration, Windows-native |
| High-performance Linux filesystem (HPC) | FSx for Lustre | Sub-millisecond latency, 100s of GB/s throughput, HPC/ML workloads |
| Single-instance boot volume, database disk | EBS | Block storage, low latency, high IOPS, single attachment |
| Unlimited object storage, data lake, backups | S3 | HTTP API, cheapest at scale, integrated with analytics (Athena, Glue) |
| NetApp-compatible enterprise NAS | FSx for NetApp ONTAP | Multi-protocol (NFS, SMB, iSCSI), data dedup, snapshots |
| Temporary high-speed scratch storage | Instance Store | Physically attached SSD, highest IOPS, ephemeral (lost on stop) |
Don't Use EFS For
- Databases โ NFS latency too high. Use EBS (gp3/io2) or RDS.
- Windows workloads โ EFS is Linux-only. Use FSx for Windows.
- Static website hosting โ S3 + CloudFront is cheaper and faster.
- Single-instance high-IOPS โ EBS io2 gives 256K IOPS vs EFS 35K.
- Large object storage (videos, archives) โ S3 is 10ร cheaper per GB.
- HPC scratch storage โ FSx for Lustre gives significantly higher throughput.
EFS Sweet Spot
- Shared content across multiple Linux instances
- Container persistent volumes (ECS, EKS, Fargate)
- Lambda large file access (models, libraries, data)
- CMS platforms (WordPress, Drupal) behind load balancers
- CI/CD shared build artifacts
- User home directories accessible from any instance
EFS shines in three patterns: (1) shared web content behind ALBs, (2) Lambda/container persistent storage via access points, and (3) multi-instance ML training data. If the exam says "shared filesystem across multiple instances" โ it's EFS. If it says "Windows" โ FSx. If it says "high IOPS single instance" โ EBS. If it says "unlimited cheap storage" โ S3.
- Pattern 1 โ Web Farm: ALB + EC2 Auto Scaling + EFS for shared WordPress/CMS content. Multi-AZ resilient.
- Pattern 2 โ Serverless: Lambda mounts EFS via access points for ML models (>512 MB). Eliminates S3 download latency.
- Pattern 3 โ Containers: ECS/Fargate tasks mount EFS for persistent, shared storage. Data survives task restarts.
- Pattern 4 โ DR: EFS Replication creates cross-region read-only replica. RPO โค 15 min. Promote on failover.
- Pattern 5 โ ML Training: Multiple GPU instances mount EFS for shared datasets and checkpoints. Max I/O + Elastic throughput.
- Pattern 6 โ Hybrid: DataSync migrates on-prem NFS to EFS. Direct Connect for live mounts.
- Alternatives: FSx for Windows (SMB), FSx for Lustre (HPC), EBS (single-instance IOPS), S3 (cheap objects).
- Anti-patterns: databases, Windows, static sites, single-instance IOPS, large archives โ use other services.