LearningTree · AWS · Storage

Amazon EFS —
Elastic File System

Fully managed, elastic, shared NFS file system — mount it across hundreds of EC2 instances, Lambda functions, and containers simultaneously, spanning multiple Availability Zones.

⚡ EFS in 30 Seconds

Shared file storage — multiple EC2 instances, Lambda functions, and containers mount the same filesystem simultaneously
Elastic capacity — grows and shrinks automatically, no pre-provisioning required
Multi-AZ by default — data replicated across ≥3 Availability Zones for high durability
NFS v4.1 protocol — standard Linux filesystem interface, works like any mounted drive
Pay per GB stored — no upfront capacity planning, no minimum fees

Chapter One

What is EFS

Introduction Introductory

Amazon EFS (Elastic File System) is AWS's fully managed, elastic, shared file storage service for Linux workloads. It provides a standard NFS (Network File System) interface, so you mount it on EC2 instances exactly like a local directory — mount -t nfs4 and you're done. Unlike EBS (which is a single-instance block device), EFS can be mounted by thousands of compute instances simultaneously across multiple Availability Zones.

👉 Think of EFS as: A shared network drive in the cloud — mount it everywhere, pay only for what you store, and it grows automatically

EFS was launched in 2016 to solve a fundamental gap in AWS storage: the need for a shared, POSIX-compliant filesystem that multiple EC2 instances could read and write concurrently. Before EFS, teams either jury-rigged NFS on EC2, used third-party solutions, or restructured applications to avoid shared storage entirely.

Why EFS Exists Introductory

⚠️

Without EFS — The Problems

EBS volumes attach to one EC2 instance — no shared access
Self-managed NFS servers — you handle HA, patching, scaling
S3 is object storage — no POSIX filesystem, can't cd or ls natively
Scaling disk means stopping, resizing, restarting — minutes of downtime
Cross-AZ data sharing requires complex replication scripts

✅

EFS Solves

Shared mount point — thousands of instances mount the same filesystem
Fully managed — AWS handles replication, patching, hardware
POSIX-compliant — ls, cat, chmod, mv all work natively
Elastic — grows and shrinks automatically with your data
Multi-AZ by default — data available across the entire region

File Storage vs Block vs Object Core

EFS fills the file storage slot in the AWS storage trio. Understanding the difference is critical for the exam and for architecture decisions:

Type	How It Works	AWS Service	Best For
File Storage	Shared filesystem with directories and files. NFS protocol. Multiple instances mount it simultaneously.	EFS	Shared config, CMS uploads, ML training data, container storage
Block Storage	Raw disk blocks. OS mounts it like a hard drive. Single-instance attachment (mostly).	EBS	Databases, boot volumes, OS-level read/write
Object Storage	Flat namespace — key → object. No folders. Access via HTTP API.	S3	Files, backups, data lakes, static assets, logs

👉 Exam rule of thumb: If the question says "shared filesystem" or "multiple EC2 instances need to access the same files" — the answer is EFS. If it says "single instance, high IOPS" — think EBS. If it says "unlimited, HTTP-accessed storage" — think S3.

Where EFS Fits in AWS Introductory

EFS integrates with core AWS compute and container services, acting as the shared data layer:

EC2 Instances

Mount EFS on multiple EC2 instances across AZs. Classic use case: web servers sharing uploaded content, config files, or CMS assets.

AWS Lambda

Lambda functions mount EFS to read/write files larger than the 512 MB /tmp limit. Essential for ML inference, large file processing.

ECS & Fargate

Containers mount EFS as persistent storage. Solves the ephemeral container storage problem — data persists across task restarts.

EKS (Kubernetes)

EFS CSI driver provides ReadWriteMany PersistentVolumes. Multiple pods across nodes share the same filesystem — perfect for shared workloads.

🔄

AWS DataSync

Migrate data from on-premises NFS/SMB to EFS at high speed. Automate ongoing replication for hybrid architectures.

💾

AWS Backup

Automated EFS backups with retention policies, cross-region copy, and compliance controls — no custom scripts needed.

Mental Model Core

Think of EFS like a shared office network drive — everyone in the building plugs in and sees the same folders and files:

🏢

Network Drive = EFS File System

A single shared filesystem, accessible by everyone
Standard directories and files — /data, /config, /uploads
Grows automatically — no one calls IT to "add more disk"
Replicated across the building's floors (AZs) automatically
You control who can access what with permissions and policies

🖥️

Computers = EC2 / Lambda / Containers

Each computer mounts the network drive to a local path
All see the same data — changes by one are immediately visible to others
Computers can be on different floors (AZs) — access is the same
New computers join instantly — no data copy needed
If one computer shuts down, others are unaffected

EFS vs EBS — When to Choose Which Core

This is the most common exam question around EFS. Here's the definitive comparison:

Feature	EFS	EBS
Type	File storage (NFS)	Block storage (disk)
Access	Multiple instances simultaneously	Single instance (multi-attach only for io1/io2)
Protocol	NFS v4.1	Block device (ext4, xfs)
Capacity	Elastic — grows/shrinks automatically	Fixed — you choose size upfront, can expand manually
AZ Scope	Multi-AZ (Regional) by default	Single AZ — locked to one AZ
Pricing	~$0.30/GB/month (Standard), pay per GB used	~$0.10/GB/month (gp3), pay per GB provisioned
Performance	Good throughput, higher latency than EBS	Low latency, high IOPS (up to 256K IOPS for io2)
OS Support	Linux only	Linux and Windows
Best For	Shared content, CMS, ML data, container volumes	Databases, boot volumes, single-instance workloads

👉 Key distinction: EFS is ~3× more expensive per GB than EBS, but you never pay for unused capacity. For a shared workload across multiple instances, EFS is cheaper than running multiple EBS volumes with data synchronization. EFS = shared. EBS = dedicated.

EFS vs FSx for Windows Core

If the exam mentions Windows or SMB — the answer is never EFS. Know this comparison:

Feature	EFS	FSx for Windows File Server
Protocol	NFS v4.1	SMB 2.0–3.1.1
OS Support	Linux only	Windows + Linux (via SMB)
Active Directory	No native AD integration	Native AD / self-managed AD
DFS Namespaces	No	Yes — Windows DFS support
Capacity	Elastic (pay per GB used)	Provisioned (choose size upfront)
Multi-AZ	Yes (default)	Optional (Single-AZ or Multi-AZ)
Best For	Linux apps, containers, Lambda	Windows file shares, .NET apps, SQL Server

👉 Exam rule: "Shared filesystem for Windows" → FSx for Windows. "Shared filesystem for Linux" → EFS. "High-performance Linux HPC" → FSx for Lustre. Never mix these up.

Concept Diagram Introductory

EFS — Multiple compute targets mount the same shared file system across AZs

EFS Key Characteristics Core

Property	Value	Why It Matters
Protocol	NFS v4.0 / v4.1	Standard Linux mount — no proprietary client needed
OS Support	Linux only	Windows → use FSx for Windows File Server instead
Capacity	Elastic (petabyte-scale)	No pre-provisioning. Grows and shrinks with data.
Durability	99.999999999% (11 nines)	Data replicated across ≥3 AZs automatically
Availability	99.99% (Standard) / 99.9% (One Zone)	Standard = multi-AZ. One Zone = single AZ, 47% cheaper
Max File Size	52 TB per file	No multi-part uploads — just write the file normally
Concurrent Mounts	Thousands	All compute in your VPC can mount the same FS
Encryption	At rest (KMS) + in transit (TLS)	Enable at creation; cannot add at-rest encryption later

Core Use Cases Introductory

Use Case	How EFS Is Used	Why Not EBS/S3
Web Server Farm	Multiple EC2 behind ALB mount EFS for shared WordPress uploads, themes, plugins	EBS = each instance gets its own copy. S3 = not a filesystem.
Container Persistent Storage	ECS/Fargate tasks mount EFS volumes to persist data across restarts	Container local storage is ephemeral — dies with the task.
ML Training Data	Training data in EFS mounted by multiple SageMaker or EC2 training instances	All instances need concurrent read access to the same dataset.
Lambda Large Files	Lambda mounts EFS for models, libraries, or data > 512 MB limit	Lambda `/tmp` is only 512 MB. S3 requires download time.
CI/CD Shared Workspace	Build agents share compiled artifacts via mounted EFS	Faster than S3 for repeated small-file reads during builds.
Home Directories	User home dirs on EFS, accessible from any instance they log into	Like a traditional NAS — user's files follow them.

👉 Key Takeaway

EFS is a shared, elastic NFS filesystem — the answer whenever multiple compute resources need to read/write the same files simultaneously. It's Linux-only, multi-AZ by default, and you pay only for what you store.

📋 Chapter 1 — Summary

File storage — POSIX-compliant NFS v4.1 filesystem. Mount it like a local directory on Linux.
Shared access — thousands of EC2, Lambda, ECS, EKS instances mount the same filesystem simultaneously.
Elastic capacity — no pre-provisioning. Grows/shrinks automatically. Pay per GB stored.
Multi-AZ by default — data replicated across ≥3 AZs. 11 nines durability. 99.99% availability.
Linux only — for Windows shared storage, use FSx for Windows File Server.
EFS vs EBS: EFS = shared, multi-AZ, elastic, NFS. EBS = dedicated, single-AZ, fixed-size, block.
Integrates with: EC2, Lambda, ECS, Fargate, EKS, DataSync, AWS Backup.
Key use cases: shared web content, container persistence, ML training data, Lambda large files.

Chapter Two

Core Concepts

File System Introductory

An EFS file system is the top-level resource — the "drive" you create once and mount everywhere. Every EFS file system gets a unique ID like fs-0a1b2c3d4e5f and a DNS name like fs-0a1b2c3d4e5f.efs.us-east-1.amazonaws.com. You never manage disks, partitions, or RAID arrays — AWS handles all of that behind the scenes.

📁

File System Properties

ID: fs-xxxxxxxxx — unique identifier
DNS name: region-specific, used in mount commands
Creation token: idempotency key to prevent duplicates
Lifecycle: exists until you explicitly delete it
Tags: key-value pairs for billing, organization

⚙️

Immutable At Creation

Encryption at rest — must enable at creation, cannot be added later
Performance mode — General Purpose or Max I/O, cannot change later
Availability — Regional (multi-AZ) or One Zone, cannot change later
Throughput mode, lifecycle policies, access points — can be changed anytime

👉 Exam trap: "Can you enable encryption on an existing EFS?" — No. You must create a new encrypted file system and migrate data. This is a frequent exam question.

Storage Classes Core

EFS offers multiple storage classes to optimize cost based on access frequency — similar in concept to S3's storage classes, but applied at the file level, not the object level:

Storage Class	Description	Cost (us-east-1)	Best For
EFS Standard	Multi-AZ. Frequently accessed data. Lowest latency.	~$0.30/GB/month	Active application data, config files, CMS uploads
EFS Standard-IA	Multi-AZ. Infrequently accessed. Lower storage cost, per-access fee.	~$0.025/GB/month + $0.01/GB read	Audit logs, old reports, seasonal data
EFS One Zone	Single AZ. Frequently accessed. ~47% cheaper than Standard.	~$0.16/GB/month	Dev/test, scratch data, easily reproducible files
EFS One Zone-IA	Single AZ. Infrequently accessed. Cheapest option.	~$0.0133/GB/month + $0.01/GB read	Dev logs, temporary backups, non-critical archives
EFS Archive	Multi-AZ. Rarely accessed (few times/year). Lowest storage cost.	~$0.008/GB/month + $0.03/GB read	Compliance archives, historical data accessed yearly

✅

Regional (Standard) — Use When

Production workloads requiring high availability
Multi-AZ EC2 deployments behind a load balancer
Data that cannot be recreated if an AZ fails
Compliance requirements mandate multi-AZ storage

💰

One Zone — Use When

Development, testing, staging environments
Data can be regenerated from source (e.g., build artifacts)
Cost is the primary concern, not resilience
All compute is in a single AZ anyway

👉 One Zone durability: Data is still replicated across multiple devices within the single AZ (11 nines durability). The real risk is AZ failure — if the entire AZ goes down, your data is inaccessible until the AZ recovers. For irreplaceable data, always use Standard (multi-AZ).

Lifecycle Management Core

EFS can automatically move files between storage classes based on access patterns — this is called EFS Intelligent-Tiering (lifecycle management). You set policies; EFS handles the rest:

⬇️

Transition to IA / Archive

Move files not accessed for N days (7, 14, 30, 60, 90, 180, 270, 365)
Applies to Standard → Standard-IA, or One Zone → One Zone-IA
Archive tier: files not accessed for 90–365+ days
Metadata stays in Standard — only file data moves
File is transparently accessible (just higher latency on first read)

⬆️

Transition Back to Standard

EFS can automatically move files back to Standard on access
Enable "Transition into Standard" policy — on first access
Hot file gets promoted; rarely-accessed files stay in IA
Combined with transition-to-IA, creates automatic tiering loop

👉 Cost savings: Enabling lifecycle management can reduce EFS costs by up to 92% for workloads with a mix of hot and cold data. For most workloads, enable both transition-to-IA (30 days) and transition-back-on-access.

Transition	Minimum Days	Notes
Standard → Standard-IA	1 day (options: 1, 7, 14, 30, 60, 90, 180, 270, 365)	Use 1-day cautiously — files accessed daily will thrash between tiers
Standard → Archive	90 days minimum	Compliance requirement — cannot archive sooner
Standard-IA → Archive	45 days after transition to IA	Must be in IA for 45+ days before moving to Archive
IA/Archive → Standard	On first access (immediate)	Enable "Transition into Standard" policy — files auto-promote

Pricing Example Core

EFS pricing is often confusing — here's a concrete example showing how lifecycle management saves money:

💸

Without Lifecycle (All Standard)

1,000 GB all in Standard
Storage: 1,000 GB × $0.30 = $300/month
No read fees, but paying full price for cold data

✅

With Lifecycle (30-day IA transition)

200 GB active (Standard) + 800 GB cold (Standard-IA)
Standard: 200 GB × $0.30 = $60/mo
IA storage: 800 GB × $0.025 = $20/mo
IA reads: ~10 GB/day × 30 × $0.01 = $3/mo
Total: ~$83/month (72% savings)

👉 EFS Backup (AWS Backup) pricing: Backup storage costs ~$0.05/GB-month (incremental). This is separate from EFS storage billing. Cross-region backup copies add data transfer + destination storage charges. Always budget backups independently.

Connection Limits Core

Limit	Value	Notes
Max concurrent NFS connections per FS	25,000	Soft limit — can be raised via AWS Support
Lambda concurrent connections per FS	25,000	Same pool as EC2 — shared limit across all clients
EC2 connection counting	1 per instance	Each mount counts as one, regardless of processes reading/writing
One Zone connection limit	25,000	Same as Standard — one mount target, same limit
Access Points per FS	1,000	Soft limit — can request increase

Access Points Core

An EFS Access Point is an application-specific entry point into an EFS file system. Think of it as a "customized door" into the filesystem — each access point can enforce a different root directory, user identity, and permissions:

🚪

Root Directory

Each access point can set a root path (e.g., /app1/data). The application only sees that subtree — cannot navigate above it. Acts as a chroot.

👤

POSIX User Identity

Override the connecting user's UID/GID. Force all access through this point to use uid=1000, gid=1000 regardless of the client's identity.

🔒

Permissions

Set directory creation permissions (e.g., 755) and owner (UID/GID) when the root directory is auto-created on first mount.

Access points are especially powerful with Lambda and ECS — each function or container can get its own access point, isolating its view of the filesystem without complex IAM or NFS permissions.

EFS Access Points — Each application gets an isolated entry into the same file system

Mount Targets Core

A mount target is the network endpoint that EC2 instances use to connect to EFS. You create one mount target per Availability Zone in your VPC. Each mount target gets:

🔌

What a Mount Target Is

An ENI (Elastic Network Interface) in your VPC subnet
Gets a private IP address (e.g., 10.0.1.15)
Gets a DNS name that resolves to the IP in that AZ
Has a security group to control NFS traffic (port 2049)
One per AZ for Regional FS, one total for One Zone FS

📡

How Mounting Works

EC2 uses the EFS DNS name → resolves to the mount target in its AZ
sudo mount -t efs fs-0a1b2c3d:/ /mnt/efs (using amazon-efs-utils)
Or standard NFS: mount -t nfs4 -o nfsvers=4.1 fs-dns:/ /mnt/efs
Traffic stays within the AZ — no cross-AZ data transfer charges for NFS
If an AZ's mount target is down, instances in that AZ lose access (others unaffected)

👉 Best practice: Always create mount targets in every AZ where you have compute resources. Use the amazon-efs-utils package for easier mounting with TLS encryption and IAM authorization built-in.

EFS Replication In-Depth

EFS Replication creates an automatic, continuous copy of your file system in another AWS region or another AZ configuration. Key facts:

🔄

How Replication Works

Creates a read-only replica in the destination region/AZ
Most changes replicated within 15 minutes (RPO ≤ 15 min)
Uses AWS backbone network — no VPN or peering needed
Same storage classes, encryption, and lifecycle policies apply
One replication configuration per file system

🎯

Use Cases

Disaster recovery: failover to replica in another region
Data locality: replica close to users in another region
Promote replica to read-write during failover
Cross-region compliance requirements
No additional cost for replication transfer — pay for destination storage

File System Policy In-Depth

A file system policy is a JSON resource-based policy (like S3 bucket policies) that applies to every connection to the EFS file system. Common uses:

Policy Action	What It Does	When to Use
Enforce in-transit encryption	Deny any NFS client that doesn't use TLS	Compliance — security standard requires encrypted transport
Enforce IAM authorization	Require IAM identity-based policies for all NFS access	Zero-trust — go beyond security groups + NFS perms
Prevent root access	Deny root user from mounting (UID 0 blocked)	Multi-tenant — prevent privileged container escape
Enforce read-only	Allow mounts but deny all write operations	Shared config/reference data that should never be modified
Restrict to specific VPCs	Only allow access from specific VPCs via conditions	Cross-account access with guardrails

👉 Key Takeaway

EFS core concepts: File System (the drive), Mount Targets (the network plugs — one per AZ), Storage Classes (Standard, IA, One Zone, Archive), Access Points (isolated app-specific entries), and Lifecycle Policies (auto-tier files to save up to 92% cost).

📋 Chapter 2 — Summary

File System: top-level resource with unique ID and DNS name. Encryption and performance mode are immutable after creation.
Storage Classes: Standard, Standard-IA, One Zone, One Zone-IA, Archive — from ~$0.30 down to ~$0.008/GB/month.
Lifecycle Management: auto-move files to IA/Archive after N days of no access. Can also auto-promote back on access. Up to 92% savings.
Access Points: application-specific entry points with enforced root directory, UID/GID override, and auto-created permissions.
Mount Targets: ENI per AZ with private IP, DNS name, and security group. NFS port 2049. Use amazon-efs-utils for easy mounting.
Replication: continuous cross-region replica with RPO ≤ 15 minutes. Read-only destination, promotable for DR.
File System Policy: resource-based JSON policy to enforce encryption in transit, IAM auth, read-only, or block root access.

Chapter Three

Performance Modes

Overview Introductory

When you create an EFS file system, you choose a performance mode. This setting is permanent — you cannot change it after creation. It controls how the file system handles I/O operations, specifically the trade-off between latency and total throughput capacity.

👉 Exam rule: Performance mode is immutable. If you chose wrong, you must create a new file system and migrate data. Choose carefully at creation time.

General Purpose Mode Core

General Purpose is the default and recommended mode for the vast majority of workloads. It provides the lowest latency per I/O operation and is suitable for latency-sensitive applications.

✅

Characteristics

Lowest per-operation latency — single-digit milliseconds
Up to 35,000 read IOPS and 7,000 write IOPS
Default mode — use unless you have a specific reason not to
CloudWatch metric: PercentIOLimit shows how close you are to the IOPS ceiling
Supports both Regional and One Zone availability

🎯

Best For

Web serving — WordPress, Drupal, CMS platforms
Content management — shared uploads, media files
Home directories — user files across instances
Development environments — code repos, build artifacts
General application data — config, logs, session state

👉 Monitoring tip: Watch the PercentIOLimit CloudWatch metric. If it consistently hits 100%, your workload may benefit from Max I/O. But try Elastic Throughput first — it's usually sufficient.

Max I/O Mode Core

Max I/O mode removes the IOPS ceiling, allowing virtually unlimited parallel I/O operations. The trade-off: slightly higher per-operation latency (tens of milliseconds instead of single-digit).

⚡

Characteristics

No IOPS limit — scales to hundreds of thousands of operations
Higher per-operation latency — tens of milliseconds (not single-digit)
Designed for highly parallelized workloads with many concurrent clients
No PercentIOLimit metric — there is no limit to hit
Cannot be changed back to General Purpose after creation

🎯

Best For

Big data analytics — hundreds of instances reading concurrently
Media processing — video transcoding across many workers
Machine learning — large training data read by many GPU instances
Genomics workflows — massively parallel file reads
Any workload where PercentIOLimit consistently hits 100%

Comparison Table Core

Feature	General Purpose (default)	Max I/O
Latency	Single-digit milliseconds (lowest)	Tens of milliseconds (slightly higher)
IOPS	Up to 35K read / 7K write	Effectively unlimited
Parallelism	Good for moderate concurrency	Optimized for massive parallelism (hundreds of clients)
PercentIOLimit	CloudWatch metric available — monitor it	Not applicable — no ceiling
Use Case	Web, CMS, containers, Lambda, home dirs	Big data, ML training, media processing, genomics
Changeable?	No — immutable after creation. Create new FS to switch.

Elastic Throughput In-Depth

AWS introduced Elastic Throughput as an enhancement for General Purpose mode that dynamically scales throughput based on workload demands — without requiring Max I/O. This has made Max I/O unnecessary for most workloads.

📈

How Elastic Throughput Works

Automatically scales read throughput up to 10 GiB/s
Write throughput up to 3 GiB/s
No capacity planning — spiky workloads handled automatically
Pay only for throughput used beyond baseline
Works with General Purpose mode only

🔄

Before vs After Elastic Throughput

Before: if you hit IOPS limits in General Purpose → switch to Max I/O (accept higher latency)
After: Elastic Throughput handles bursts in General Purpose — Max I/O rarely needed
Most workloads that previously required Max I/O now work fine with General Purpose + Elastic Throughput

👉 Current recommendation (2026): Start with General Purpose + Elastic Throughput. Only consider Max I/O if you have truly massive parallelism (500+ concurrent clients doing heavy I/O). Elastic Throughput handles most "burst" scenarios.

Decision Flowchart Core

Choosing Performance Mode — Decision Flow

Exam Scenarios In-Depth

Scenario	Answer	Why
"WordPress farm with 10 EC2 instances sharing uploads"	General Purpose	Low latency needed for web requests. 10 instances = far below IOPS ceiling.
"500 compute instances processing genomics data in parallel"	Max I/O	Massive parallelism. Latency tolerance is acceptable. Need unlimited IOPS.
"Lambda functions reading ML models from shared storage"	General Purpose	Lambda cold starts already add latency — need fast I/O per request. `PercentIOLimit` unlikely to be reached.
"EFS PercentIOLimit metric at 100% constantly"	Migrate to Max I/O (or enable Elastic Throughput)	Hitting the ceiling. Either switch mode or use Elastic Throughput to burst past the limit.
"Video rendering farm reading large files, latency not critical"	Max I/O	High parallelism, large sequential reads, higher latency acceptable for batch processing.
"Can I switch from General Purpose to Max I/O?"	No	Performance mode is immutable. Must create a new file system and migrate.

👉 Key Takeaway

Start with General Purpose + Elastic Throughput — it handles 90%+ of workloads. Only choose Max I/O for truly massive parallelism (500+ clients, batch analytics, genomics). Performance mode is immutable — you cannot change it after creation.

📋 Chapter 3 — Summary

Two performance modes: General Purpose (default, low latency) and Max I/O (unlimited IOPS, higher latency).
General Purpose: single-digit ms latency, up to 35K read / 7K write IOPS. Best for web, CMS, containers, Lambda.
Max I/O: no IOPS ceiling, tens of ms latency. Best for big data, genomics, media processing with 500+ clients.
Immutable: performance mode cannot be changed after creation. Must create new FS and migrate.
Elastic Throughput: auto-scales to 10 GiB/s read / 3 GiB/s write in General Purpose mode. Made Max I/O rarely needed.
Monitor: PercentIOLimit CloudWatch metric (General Purpose only). If at 100%, consider Elastic Throughput or Max I/O.
Default choice: General Purpose + Elastic Throughput for 90%+ of workloads.

Chapter Four

Throughput Modes

Performance Mode vs Throughput Mode Introductory

Students often confuse these two settings. They control different things:

⚙️

Performance Mode (Ch. 3)

Controls IOPS — how many I/O operations per second
Controls latency — how fast each operation completes
General Purpose vs Max I/O
Immutable — set at creation, cannot change

📊

Throughput Mode (This Chapter)

Controls throughput — how many MB/s or GB/s of data transfer
How fast you can read/write large amounts of data
Bursting vs Provisioned vs Elastic
Changeable — can switch modes anytime

👉 Analogy: Performance mode = how many cars can enter the highway at once (IOPS). Throughput mode = the speed limit on the highway (MB/s). Both matter, but they're independent settings.

Bursting Throughput Core

Bursting is the default throughput mode. Your throughput scales with how much data is stored in EFS — the more data you store, the higher your baseline and burst throughput. It works like a token bucket:

📈

How Bursting Works

Baseline throughput: 50 KiB/s per GB of data stored in Standard class
Burst throughput: up to 100 MiB/s (regardless of size)
Minimum baseline: 1 MiB/s (even for tiny file systems)
Burst credits accumulate when throughput is below baseline
Credits consumed when bursting above baseline

🔢

Throughput by Storage Size

100 GB stored → baseline 5 MiB/s, burst to 100 MiB/s
1 TB stored → baseline 50 MiB/s, burst to 100 MiB/s
10 TB stored → baseline 500 MiB/s (no burst needed — baseline exceeds burst cap)
Credit balance visible in CloudWatch: BurstCreditBalance

👉 The problem with Bursting: If your file system is small (e.g., 50 GB of config files) but your workload is throughput-heavy (reads/writes many MB/s), you'll burn through burst credits and get throttled to a tiny baseline. This is the #1 EFS performance complaint.

Burst Credit Parameter	Value	Example
Baseline throughput	50 KiB/s per GB stored (Standard)	100 GB FS → 5 MiB/s baseline
Minimum baseline	1 MiB/s (even for tiny FS)	1 GB FS still gets 1 MiB/s baseline
Maximum burst	100 MiB/s	Cannot exceed regardless of credits available
Credit accumulation	At baseline rate when idle	5 MiB/s × 3600s = 18 GiB/hour credited
Credit consumption	Each MiB above baseline = 1 credit	Bursting at 100 MiB/s = 95 MiB/s credit burn (if 5 MiB/s baseline)
Throttle condition	Credits reach zero	Throughput drops to baseline (e.g., 5 MiB/s). Monitor `BurstCreditBalance`.

Provisioned Throughput Core

Provisioned Throughput lets you specify exactly how much throughput you need, independent of storage size. You pay for what you provision.

🎛️

Characteristics

Set throughput from 1 MiB/s to 3,125 MiB/s
Decouples throughput from storage size
Pay for provisioned throughput + storage separately
Can change provisioned value (increase/decrease) anytime
If actual throughput exceeds provisioned → still bill at provisioned rate, may throttle

🎯

Use When

Small file system with high throughput needs (e.g., 20 GB, need 200 MiB/s)
Predictable, steady throughput requirements
BurstCreditBalance keeps hitting zero
You know your exact throughput requirements and want cost certainty

Elastic Throughput Core

Elastic Throughput is the newest and recommended mode for most workloads. It automatically scales throughput up and down based on demand — no planning, no burst credits, no provisioning.

⚡

Characteristics

Automatically scales to up to 10 GiB/s read, 3 GiB/s write
No burst credits to manage — throughput instantly available
Pay per GiB of data transferred (read: ~$0.03/GiB, write: ~$0.06/GiB)
No baseline throughput limits based on storage size
Works with General Purpose performance mode only

🎯

Use When

Spiky, unpredictable workloads (e.g., CI/CD pipelines, batch processing)
You don't want to manage burst credits or provision throughput
Workloads that are idle most of the time but need high throughput in bursts
Small file systems that need more throughput than Bursting allows
Default recommendation for new file systems

Throughput Modes — Full Comparison Core

Feature	Bursting (default)	Provisioned	Elastic ⭐
How It Works	Throughput scales with stored data. Burst credits when idle.	You specify exact MiB/s. Fixed cost.	Auto-scales on demand. Pay per GiB transferred.
Max Read	100 MiB/s (burst) or 50 KiB/s × GB	Up to 3,125 MiB/s	Up to 10 GiB/s
Max Write	100 MiB/s (burst)	Up to 3,125 MiB/s	Up to 3 GiB/s
Pricing	Included in storage cost	Storage + throughput provisioned	Storage + per-GiB data transfer
Best For	Large FS with moderate throughput	Small FS, predictable high throughput	Spiky workloads, unpredictable patterns
Risk	Burst credit exhaustion → throttled	Over-provisioning wastes money	Cost unpredictable if data transfer is very high
Changeable?	Yes — you can switch between modes anytime

Throughput Monitoring In-Depth

CloudWatch metrics to monitor EFS throughput performance:

Metric	What It Measures	When to Act
`BurstCreditBalance`	Remaining burst credits (Bursting mode only)	Trending toward zero → switch to Elastic or Provisioned
`TotalIOBytes`	Total bytes read + written per period	Spikes indicate burst patterns — consider Elastic
`MeteredIOBytes`	Bytes billed for Elastic Throughput (data transfer)	Track cost — if consistent, Provisioned may be cheaper
`PermittedThroughput`	Max throughput allowed at this moment	If actual throughput equals permitted → being throttled
`PercentIOLimit`	How close to IOPS ceiling (General Purpose only)	Sustained 100% → consider Max I/O or Elastic Throughput

Cost Optimization Decision In-Depth

Choosing Throughput Mode — Cost vs Flexibility

Exam Scenarios In-Depth

Scenario	Answer	Why
"Small EFS (50 GB) keeps getting throttled, BurstCreditBalance at zero"	Switch to Elastic or Provisioned	50 GB = only 2.5 MiB/s baseline. Once burst credits gone, throughput drops to baseline.
"10 TB file system, steady 200 MiB/s throughput"	Bursting works fine	10 TB = 500 MiB/s baseline. Well above the 200 MiB/s need. No credits consumed.
"CI/CD builds spike throughput for 5 minutes then idle for hours"	Elastic Throughput	Spiky, unpredictable. Elastic charges only during the burst. Provisioned wastes money during idle.
"Constant 500 MiB/s throughput 24/7 for video processing"	Provisioned	Predictable, constant. Provisioned at 500 MiB/s is cheaper than Elastic's per-GiB charges at this volume.
"Can I change throughput mode later?"	Yes	Unlike performance mode, throughput mode can be switched between Bursting, Provisioned, and Elastic anytime.

👉 Key Takeaway

Throughput mode controls MB/s data transfer speed (not IOPS). Elastic is the default recommendation — auto-scales, no credits to manage, pay per use. Choose Provisioned for constant high throughput, Bursting only for large file systems with modest I/O. Unlike performance mode, throughput mode can be changed anytime.

📋 Chapter 4 — Summary

Performance mode ≠ Throughput mode: Performance = IOPS/latency (immutable). Throughput = MB/s data transfer (changeable).
Bursting (default): throughput scales with data stored. 50 KiB/s per GB baseline. Burst to 100 MiB/s. Risk: credit exhaustion for small FS.
Provisioned: fixed MiB/s you specify. Decoupled from storage size. Best for constant, predictable workloads.
Elastic ⭐: auto-scales to 10 GiB/s read / 3 GiB/s write. Pay per GiB transferred. Best for spiky/unpredictable workloads. Recommended default.
Changeable: you can switch between all three throughput modes anytime — no migration needed.
Monitor: BurstCreditBalance (Bursting), MeteredIOBytes (Elastic cost), PermittedThroughput (throttling detection).

Chapter Five

Networking & Mount Targets

Mount Target Fundamentals Introductory

To connect to an EFS file system, your compute resources need a mount target — a network entry point inside your VPC. Without mount targets, EFS is unreachable. Every mount target is an Elastic Network Interface (ENI) with a private IP address in one of your VPC subnets.

🔌

Mount Target = ENI

One mount target per AZ (for Regional file systems)
One mount target total (for One Zone file systems — in the chosen AZ)
Gets a private IP from your subnet's CIDR
Gets a DNS name: az-id.fs-id.efs.region.amazonaws.com
Appears in EC2 → Network Interfaces as a managed ENI

📡

DNS Resolution

EFS DNS name: fs-id.efs.region.amazonaws.com
Resolves to the mount target IP in the caller's AZ
EC2 in us-east-1a → resolves to mount target in us-east-1a
Requires VPC DNS resolution and DNS hostnames enabled
If mount target missing in an AZ → DNS resolution fails for instances in that AZ

👉 Common mistake: Forgetting to create a mount target in an AZ where EC2 instances run. The instance can't resolve the EFS DNS name → mount fails with Connection timed out. Always create mount targets in every AZ with compute resources.

Network Architecture Core

EFS Networking — Mount targets, subnets, security groups, and NFS port 2049

Security Group Configuration Core

Security groups are the most important networking control for EFS. You need two security groups configured correctly:

🛡️

Mount Target Security Group

Attached to each mount target ENI
Inbound rule: TCP port 2049 (NFS) from client security groups
Source: reference the client SG (not CIDR — more secure and dynamic)
Outbound: default (allow all) is fine
One SG can be shared by all mount targets

🖥️

Client (EC2/Lambda/ECS) Security Group

Attached to your compute instances
Outbound rule: TCP port 2049 to the mount target security group
If you reference SG IDs (not CIDRs), scaling is automatic — new instances get access instantly
No inbound rule needed for NFS (client initiates connection)

👉 Best practice: Always reference security group IDs instead of IP ranges. This way, any new EC2 instance added to the client SG automatically gets EFS access — no rule updates needed.

Mounting EFS Core

There are two ways to mount EFS on EC2. The amazon-efs-utils helper is strongly recommended:

✅

Using amazon-efs-utils (Recommended)

Install: sudo yum install -y amazon-efs-utils
Mount: sudo mount -t efs fs-0a1b2c3d:/ /mnt/efs
With TLS: sudo mount -t efs -o tls fs-0a1b2c3d:/ /mnt/efs
With IAM: sudo mount -t efs -o tls,iam fs-0a1b2c3d:/ /mnt/efs
With Access Point: sudo mount -t efs -o tls,accesspoint=fsap-xxxx fs-0a1b2c3d:/ /mnt/efs
Auto-reconnect, watchdog, logging built-in

⚙️

Using Standard NFS Client

Install: sudo yum install -y nfs-utils
Mount: sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-dns:/ /mnt/efs
Works but: no TLS, no IAM, no auto-reconnect
Must manage NFS options manually
Use only if amazon-efs-utils is not available

For persistent mounts (survive reboot), add to /etc/fstab:

fs-0a1b2c3d:/ /mnt/efs efs _netdev,tls,iam 0 0

Cross-AZ, Cross-VPC, and Cross-Account Access In-Depth

Access Pattern	How to Configure	Cost Implications
Same VPC, Same AZ	DNS resolves to local mount target. No extra config.	No data transfer charges
Same VPC, Cross-AZ	DNS resolves to local AZ mount target — traffic stays in-AZ	No cross-AZ charges for NFS (AWS absorbs it)
Cross-VPC (same region)	VPC Peering or Transit Gateway. Mount via mount target IP or DNS.	Standard VPC peering / TGW data transfer charges
Cross-Account	Share via Resource Access Manager (RAM) or IAM resource policy on EFS. Peer VPCs.	Peering charges apply. EFS access is free beyond transfer.
Cross-Region	Not directly possible. Use EFS Replication for a read-only replica in another region.	Replication storage cost. No direct NFS mount cross-region.
On-Premises	AWS Direct Connect or VPN + mount target IP. Or use DataSync for data transfer.	Direct Connect / VPN charges + data transfer out fees

Lambda + EFS Networking In-Depth

Lambda mounting EFS has specific networking requirements that differ from EC2:

Requirements

Lambda function must be in a VPC (VPC-attached Lambda)
Lambda's subnet must be in the same AZ as an EFS mount target
Lambda's security group needs outbound TCP 2049
Mount target SG needs inbound TCP 2049 from Lambda SG
Lambda uses an Access Point (required — cannot mount root directly)

⚠️

Gotchas

VPC Lambda has cold start penalty — ENI creation adds 1-2 seconds
Lambda needs NAT Gateway for internet access when in VPC
EFS mount adds ~1 second to cold starts (connection setup)
Max 25,000 concurrent connections per EFS file system for Lambda
Lambda + EFS = VPC + Subnet + SG + Mount Target + Access Point — lots to configure

Troubleshooting Mount Failures In-Depth

Error	Cause	Fix
`Connection timed out`	Security group blocking port 2049; or no mount target in the instance's AZ	Check SG inbound rules. Verify mount target exists in the AZ.
`mount.nfs4: No such device`	`nfs-utils` or `amazon-efs-utils` not installed	`sudo yum install -y amazon-efs-utils`
`Permission denied`	IAM policy denying access; or file system policy blocking; or POSIX permissions wrong	Check IAM role, EFS resource policy, and file/dir permissions (chmod)
`Name resolution failed`	VPC DNS resolution not enabled; or DNS hostnames disabled	Enable DNS resolution and DNS hostnames in VPC settings
`nfs: server not responding`	Mount target ENI deleted or AZ outage	Verify mount target status in console. Failover to another AZ if needed.

👉 Key Takeaway

EFS networking = mount targets (one per AZ) + security groups (port 2049) + DNS resolution. Use amazon-efs-utils for TLS and IAM. Reference SG IDs (not CIDRs) for dynamic scaling. Lambda requires VPC attachment + Access Point.

📋 Chapter 5 — Summary

Mount targets: ENI per AZ with private IP. One per AZ for Regional FS, one total for One Zone. NFS port 2049.
DNS resolution: EFS DNS resolves to mount target IP in caller's AZ. Requires VPC DNS resolution + DNS hostnames enabled.
Security groups: mount target SG (inbound 2049 from client SGs) + client SG (outbound 2049 to mount target SG). Reference SG IDs, not CIDRs.
Mounting: use amazon-efs-utils for TLS, IAM, and auto-reconnect. Add to /etc/fstab for persistence.
Cross-network: same VPC = free. Cross-VPC = peering/TGW. Cross-account = RAM + peering. Cross-region = replication only.
Lambda: requires VPC, same-AZ subnet, access point, SG on port 2049. Adds cold start latency (~1-2s).
Troubleshooting: most failures = SG misconfiguration or missing mount target in the AZ.

Chapter Six

Security & Access Control

Security Layers Overview Introductory

EFS security is a layered defense — multiple independent controls that work together. Understand each layer and how they combine:

EFS Security — Four Independent Layers

👉 All layers must allow access — if any layer denies, the request fails. A request must pass: (1) security group allows port 2049, (2) IAM allows the EFS action, (3) file system policy allows the connection, (4) POSIX permissions allow the file operation.

Layer 1: Network Security Core

Network is the first barrier. This was covered in Chapter 5 — here's the security-specific summary:

Control	What to Configure	Key Point
Security Groups	Mount target SG: inbound TCP 2049 from client SGs. Client SG: outbound TCP 2049 to mount target SG.	Reference SG IDs, not CIDRs. Most common misconfiguration.
Subnets	Place mount targets in private subnets only.	EFS should never be in a public subnet — no internet-facing NFS.
NACLs	Ensure NACLs allow TCP 2049 and ephemeral ports (1024-65535) between subnets.	NACLs are stateless — must allow both request and response ports.
VPC Endpoints	Not required — EFS uses mount targets (ENIs), not VPC endpoints.	Unlike S3 (gateway endpoint), EFS access is always through mount targets inside the VPC.

Layer 2: IAM Policies Core

IAM controls who can perform EFS API actions and who can mount the file system. EFS supports two types of IAM integration:

📋

API-Level IAM (Management)

Controls AWS API actions: elasticfilesystem:CreateFileSystem, :DeleteFileSystem, :DescribeFileSystems
Attached to IAM users/roles that manage EFS via Console/CLI/SDK
Standard IAM policy — same as any AWS service
Does NOT control NFS data access (read/write files)

🔑

NFS-Level IAM (Data Access)

Controls NFS mount and file operations: elasticfilesystem:ClientMount, :ClientWrite, :ClientRootAccess
Requires mounting with -o tls,iam flag
EC2 instance role / Lambda execution role must have these permissions
Combined with file system policy for full zero-trust control

Key IAM actions for NFS data access:

IAM Action	What It Controls	Notes
`elasticfilesystem:ClientMount`	Permission to mount the file system (read-only)	Required for any mount. Without `ClientWrite`, mount is read-only.
`elasticfilesystem:ClientWrite`	Permission to write data to the file system	Add this for read-write mounts. Omit for read-only access.
`elasticfilesystem:ClientRootAccess`	Permission to access as root user (UID 0)	Deny this to prevent containers/instances from acting as root on the filesystem.

👉 Exam note: IAM-based NFS access is optional — by default, any EC2 instance that can reach the mount target (network layer) can mount and read/write. IAM adds an additional authorization layer. Enable it by mounting with -o tls,iam and by setting a file system policy that enforces IAM.

Layer 3: File System Policy Core

A file system policy is a JSON resource-based policy attached directly to the EFS file system — similar to an S3 bucket policy. It applies to every NFS connection regardless of which client connects.

📜

Common Policy Patterns

Enforce encryption in transit: deny any connection without TLS
Enforce IAM authorization: deny anonymous NFS clients
Prevent root access: deny ClientRootAccess for all principals
Read-only access: deny ClientWrite globally
Restrict to specific accounts/roles: condition on aws:PrincipalArn

⚙️

How to Apply

Console: EFS → File System → Edit → File System Policy
CLI: aws efs put-file-system-policy
Preconfigured toggles in console for common patterns
Can be set at any time — does NOT require recreating the FS
Takes effect immediately for new connections

Example file system policy that enforces TLS and IAM for all connections:

File System Policy — Enforce TLS + IAM

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EnforceTLSAndIAM",
      "Effect": "Deny",
      "Principal": { "AWS": "*" },
      "Action": "*",
      "Resource": "arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-0a1b2c3d",
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"     ← Deny if NOT using TLS
        }
      }
    },
    {
      "Sid": "EnforceIAMAuth",
      "Effect": "Deny",
      "Principal": { "AWS": "*" },
      "Action": "*",
      "Resource": "arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/fs-0a1b2c3d",
      "Condition": {
        "Bool": {
          "elasticfilesystem:AccessedViaMountTarget": "true"
        },
        "StringNotEquals": {
          "elasticfilesystem:AccessPointArn": ""   ← Must use access point
        }
      }
    }
  ]
}

Layer 4: POSIX Permissions Core

EFS is a POSIX-compliant filesystem — standard Linux file permissions (rwxr-xr-x) and ownership (uid:gid) apply to every file and directory. This is the final layer of access control.

📂

How It Works

Every file/directory has an owner UID, group GID, and permission bits
chmod 755 /data — owner rwx, group r-x, others r-x
chown 1000:1000 /data — set owner to UID 1000
NFS client connects with a UID/GID (from the EC2 instance's user)
EFS checks if that UID/GID has permission for the requested operation

🚪

Access Points Override

Access Points can override the connecting user's UID/GID
All connections through the AP use the specified UID/GID
Combined with root directory setting → app sees only its subtree
Simplifies permission management for Lambda and containers
No need to manage Linux users across instances

Encryption Core

🔐

Encryption at Rest

Uses AWS KMS (managed key or customer-managed CMK)
Must enable at creation time — cannot add later
Encrypts all data, metadata, and temporary files
Transparent — no performance impact, no code changes
Default KMS key: aws/elasticfilesystem
Customer-managed CMK: you control rotation, access policy, deletion

🔒

Encryption in Transit (TLS)

TLS 1.2 encryption for NFS traffic between client and mount target
Enable by mounting with -o tls (requires amazon-efs-utils)
Can be enabled/disabled anytime — per-mount decision
Enforced globally via file system policy (aws:SecureTransport)
Slight CPU overhead on client side — negligible for most workloads

👉 Exam question pattern: "How to encrypt an existing unencrypted EFS?" → You cannot. Create a new encrypted file system, use DataSync or rsync to copy data, then switch mount targets. This is a common trap.

Security Best Practices In-Depth

#	Practice	How to Implement
1	Always encrypt at rest	Enable at creation. Use customer-managed CMK for compliance.
2	Always encrypt in transit	Mount with `-o tls`. Set FS policy to deny unencrypted connections.
3	Enable IAM authorization	Mount with `-o tls,iam`. Set FS policy to enforce IAM auth.
4	Use Access Points	One AP per application. Enforce root dir + UID/GID per app.
5	Deny root access	FS policy: deny `ClientRootAccess`. Prevents containers from running as root on EFS.
6	Private subnets only	Never place mount targets in public subnets. No internet-facing NFS.
7	Reference SG IDs in rules	SG-to-SG references. Auto-scales with fleet. No IP management.
8	Enable AWS Backup	Automated backups with retention policies. Cross-region copy for DR.
9	Enable CloudTrail logging	Log all EFS API calls. Audit who created/deleted/modified file systems.
10	Restrict with resource policies	FS policy to allow only specific accounts, roles, or VPCs.

Security — Exam Scenarios In-Depth

Scenario	Answer	Why
"Compliance requires all EFS data encrypted at rest and in transit"	Create FS with encryption enabled + FS policy enforcing `aws:SecureTransport`	At-rest = creation time. In-transit = FS policy denies non-TLS connections.
"Prevent containers from writing as root to EFS"	FS policy: deny `ClientRootAccess`	Blocks UID 0 operations. Use Access Points to assign non-root UID/GID.
"Multiple Lambda functions need isolated directories on same EFS"	Create one Access Point per Lambda with different root dirs and UID/GID	Each AP enforces isolation. Lambda sees only its subtree.
"EC2 can ping mount target IP but mount times out"	Security group missing inbound TCP 2049 rule	ICMP (ping) and TCP 2049 (NFS) are separate rules. SG must explicitly allow NFS.
"How to share EFS across two AWS accounts?"	VPC peering + FS policy allowing the other account's principal	Network connectivity (peering) + authorization (FS policy with cross-account principal).

👉 Key Takeaway

EFS security is four layers deep: Network (SG port 2049) → IAM (ClientMount/ClientWrite/ClientRootAccess) → File System Policy (enforce TLS, IAM, deny root) → POSIX Permissions (chmod/chown). All layers must allow access. Encryption at rest is immutable — enable it at creation.

📋 Chapter 6 — Summary

Four security layers: Network → IAM → File System Policy → POSIX permissions. All must allow access.
Network: SG on port 2049. Mount targets in private subnets. Reference SG IDs, not CIDRs.
IAM: ClientMount, ClientWrite, ClientRootAccess. Requires -o tls,iam mount option.
File System Policy: resource-based JSON policy. Enforce TLS, IAM auth, block root, read-only. Changeable anytime.
POSIX: standard Linux rwx permissions and UID/GID ownership. Access Points override connecting user identity.
Encryption at rest: KMS-based. Immutable — must enable at creation. Cannot add to existing FS.
Encryption in transit: TLS 1.2 via -o tls. Enforceable via FS policy. Can be enabled/disabled per mount.

Chapter Seven

Architecture Patterns

Pattern 1 — Shared Web Content (ALB + EC2 + EFS) Core

The classic EFS use case: multiple EC2 web servers behind an Application Load Balancer, all sharing the same uploaded content, themes, and configuration files via EFS.

Pattern 1 — WordPress / CMS Farm with Shared EFS Storage

✅

Why This Works

All EC2 instances see identical /wp-content — uploads, themes, plugins shared
Auto Scaling Group can add/remove instances — new ones mount EFS instantly
Multi-AZ ALB + Multi-AZ EFS + Multi-AZ RDS = fully resilient
No data sync scripts, no S3 plugins, no shared NFS servers to manage

⚙️

Configuration

Performance mode: General Purpose
Throughput mode: Elastic (handles traffic spikes)
Storage class: Standard + IA lifecycle (30-day transition)
Mount in user data: mount -t efs -o tls fs-id:/ /var/www/wp-content

Pattern 2 — Serverless File Processing (Lambda + EFS) Core

Lambda functions mount EFS to process files too large for the 512 MB /tmp limit — ML model inference, PDF generation, video thumbnail extraction.

Pattern 2 — Lambda reads ML models from EFS for inference

✅

Benefits

ML model (2 GB) shared across all Lambda invocations
No S3 download on each invocation — EFS is already mounted
Update model on EFS → all Lambdas get new version instantly

⚙️

Config

Lambda in VPC with EFS access point
Elastic throughput for burst reads
Access Point: /ml, UID 1000

⚠️

Trade-offs

VPC Lambda cold starts (~1-2s extra)
25K connection limit per EFS FS
Needs NAT Gateway for internet

Pattern 3 — Container Persistent Storage (ECS/Fargate + EFS) Core

Containers are ephemeral — when a task stops, its local storage is lost. EFS provides persistent, shared storage that survives container restarts and can be shared across tasks.

Pattern 3 — ECS Fargate tasks with persistent EFS volumes

👉 ECS Task Definition config: Set "volumes" → "efsVolumeConfiguration" with file system ID, access point ID, and "transitEncryption": "ENABLED". Mount into containers via "mountPoints".

Pattern 4 — Disaster Recovery with EFS Replication In-Depth

🌍

Architecture

Primary region: us-east-1 — active EFS with read/write access
DR region: us-west-2 — read-only replica, RPO ≤ 15 min
Continuous replication over AWS backbone (no VPN needed)
On failure: promote DR replica to read/write
Update DNS (Route 53) to point compute to DR region

⚡

Failover Steps

1. Detect primary region failure (CloudWatch alarm / manual)
2. Delete replication configuration on DR file system
3. DR file system becomes read/write
4. Update mount targets / DNS in DR region
5. Start DR compute resources (EC2, Lambda, ECS)
6. After recovery: set up replication in reverse direction

Pattern 5 — ML Training Pipeline In-Depth

🤖

Architecture

Training data stored in EFS (datasets, preprocessed features)
Multiple GPU EC2 instances mount EFS concurrently
Each instance reads different data shards from the same filesystem
Model checkpoints written to shared EFS → any instance can resume
Performance mode: Max I/O (massive parallel reads)
Throughput mode: Elastic (burst during training, idle between runs)

📊

Why EFS Over S3

S3 requires downloading datasets to local disk → startup delay
EFS is already mounted — training starts immediately
Random read access to files (EFS) vs sequential download (S3)
Checkpoint save: torch.save(model, "/mnt/efs/checkpoints/epoch_5.pt")
Other instances immediately see the checkpoint — failover is instant

Pattern 6 — Hybrid Cloud (On-Premises + AWS) In-Depth

🏢

Architecture

On-premises NFS → AWS DataSync → EFS
Migrate existing file shares to cloud incrementally
Direct Connect or VPN for on-premises NFS mount
AWS-side processing (Lambda, EC2) accesses data via EFS
Bi-directional sync for hybrid workflows

🔄

Migration Path

Phase 1: DataSync copies data from on-prem NFS to EFS (initial sync)
Phase 2: Incremental syncs (deltas only) on schedule
Phase 3: Cut over — point apps to EFS, decommission on-prem NFS
DataSync handles permissions, timestamps, symlinks
Transfer speeds up to 10 Gbps over Direct Connect

When to Use EFS vs Alternatives Core

Requirement	Best Service	Why
Shared Linux filesystem, multi-AZ	EFS	NFS v4.1, elastic, multi-AZ, managed
Shared Windows filesystem (SMB)	FSx for Windows	SMB protocol, Active Directory integration, Windows-native
High-performance Linux filesystem (HPC)	FSx for Lustre	Sub-millisecond latency, 100s of GB/s throughput, HPC/ML workloads
Single-instance boot volume, database disk	EBS	Block storage, low latency, high IOPS, single attachment
Unlimited object storage, data lake, backups	S3	HTTP API, cheapest at scale, integrated with analytics (Athena, Glue)
NetApp-compatible enterprise NAS	FSx for NetApp ONTAP	Multi-protocol (NFS, SMB, iSCSI), data dedup, snapshots
Temporary high-speed scratch storage	Instance Store	Physically attached SSD, highest IOPS, ephemeral (lost on stop)

EFS Anti-Patterns In-Depth

🚫

Don't Use EFS For

Databases — NFS latency too high. Use EBS (gp3/io2) or RDS.
Windows workloads — EFS is Linux-only. Use FSx for Windows.
Static website hosting — S3 + CloudFront is cheaper and faster.
Single-instance high-IOPS — EBS io2 gives 256K IOPS vs EFS 35K.
Large object storage (videos, archives) — S3 is 10× cheaper per GB.
HPC scratch storage — FSx for Lustre gives significantly higher throughput.

✅

EFS Sweet Spot

Shared content across multiple Linux instances
Container persistent volumes (ECS, EKS, Fargate)
Lambda large file access (models, libraries, data)
CMS platforms (WordPress, Drupal) behind load balancers
CI/CD shared build artifacts
User home directories accessible from any instance

👉 Key Takeaway

EFS shines in three patterns: (1) shared web content behind ALBs, (2) Lambda/container persistent storage via access points, and (3) multi-instance ML training data. If the exam says "shared filesystem across multiple instances" — it's EFS. If it says "Windows" → FSx. If it says "high IOPS single instance" → EBS. If it says "unlimited cheap storage" → S3.

📋 Chapter 7 — Summary

Pattern 1 — Web Farm: ALB + EC2 Auto Scaling + EFS for shared WordPress/CMS content. Multi-AZ resilient.
Pattern 2 — Serverless: Lambda mounts EFS via access points for ML models (>512 MB). Eliminates S3 download latency.
Pattern 3 — Containers: ECS/Fargate tasks mount EFS for persistent, shared storage. Data survives task restarts.
Pattern 4 — DR: EFS Replication creates cross-region read-only replica. RPO ≤ 15 min. Promote on failover.
Pattern 5 — ML Training: Multiple GPU instances mount EFS for shared datasets and checkpoints. Max I/O + Elastic throughput.
Pattern 6 — Hybrid: DataSync migrates on-prem NFS to EFS. Direct Connect for live mounts.
Alternatives: FSx for Windows (SMB), FSx for Lustre (HPC), EBS (single-instance IOPS), S3 (cheap objects).
Anti-patterns: databases, Windows, static sites, single-instance IOPS, large archives → use other services.