Amazon SageMaker β
Fully Managed ML Platform
Build, train, and deploy machine learning models at scale. SageMaker removes the heavy lifting from every step of the ML lifecycle β from data labeling to production inference.
β‘ SageMaker in 30 Seconds
- Fully managed ML platform β no infrastructure to manage for training or inference
- Integrated Jupyter notebooks for exploration and feature engineering
- Built-in algorithms (XGBoost, Linear Learner, etc.) or bring your own container
- One-click model deployment with auto-scaling endpoints
- MLOps built-in: pipelines, model registry, experiment tracking, and monitoring
What is SageMaker
Amazon SageMaker is a fully managed service that enables developers and data scientists to build, train, and deploy machine learning models quickly and at scale.
π Think of SageMaker as: A complete ML factory β from raw data to production predictions
SageMaker eliminates the undifferentiated heavy lifting of ML infrastructure. Instead of manually provisioning GPU clusters, configuring training environments, and building deployment pipelines, SageMaker provides managed components for every stage.
Before SageMaker
- Manual GPU cluster management
- Weeks to set up training infrastructure
- Custom deployment pipelines for every model
- No standard experiment tracking
- Model monitoring was an afterthought
SageMaker Solves
- Managed compute β scales to thousands of GPUs
- Training starts in minutes, not weeks
- One-click deployment with auto-scaling
- Built-in experiment tracking and model registry
- Automated model monitoring and drift detection
SageMaker sits in the AI/ML layer of AWS:
- Cloud β AI & ML β Machine Learning Platform
It is used for:
Custom ML Models
Train and deploy custom models for fraud detection, recommendations, forecasting, and NLP.
MLOps at Scale
Automated pipelines, model versioning, A/B testing, and continuous training for production ML systems.
Experimentation
Jupyter notebooks, data wrangling, feature engineering, and rapid prototyping with managed compute.
Think of SageMaker like a factory assembly line for ML:
You Manage
- Define the ML problem
- Prepare and label data
- Choose/write algorithms
- Evaluate model quality
- Define business logic
AWS Manages
- GPU/CPU compute clusters
- Distributed training infrastructure
- Model hosting and auto-scaling
- Container orchestration
- Network, storage, and security
SageMaker is a complete ML platform that handles infrastructure so you focus on data and algorithms
ML Lifecycle
Machine learning is not just training a model. It's a full lifecycle. SageMaker provides managed tools for every stage:
ML is a lifecycle, not a single step β SageMaker covers data prep to production monitoring
Core Components
Managed Jupyter notebook instances for data exploration and model development:
- Pre-configured with ML frameworks (TensorFlow, PyTorch, MXNet, scikit-learn)
- Scales from small CPU instances to large GPU instances
- Integrated with S3, IAM, and VPC
- Lifecycle configurations for automated setup
SageMaker provides 17+ built-in algorithms optimized for scale and performance on AWS infrastructure:
| Algorithm | Category | Use Case |
|---|---|---|
| XGBoost | Classification/Regression | Tabular data prediction, fraud detection |
| Linear Learner | Classification/Regression | Simple predictions at scale |
| BlazingText | NLP | Text classification, Word2Vec embeddings |
| Image Classification | Computer Vision | Classify images into categories |
| Object Detection | Computer Vision | Detect objects in images (bounding boxes) |
| DeepAR | Time Series | Forecasting (demand, revenue, capacity) |
| K-Means | Unsupervised | Clustering, customer segmentation |
| Random Cut Forest | Anomaly Detection | Detect outliers in streaming data |
| Factorization Machines | Recommendation | Click prediction, recommendations |
π When to use built-in algorithms: When your data fits standard problem types (tabular, text, image). They're optimized for distributed training on AWS β faster and cheaper than custom code for common problems.
SageMaker supports three levels of customization:
| Approach | Effort | When to Use |
|---|---|---|
| Built-in Algorithms | Lowest β just provide data | Standard ML problem types (classification, regression, NLP) |
| Script Mode | Medium β write training script | Custom logic with popular frameworks (PyTorch, TensorFlow) |
| Bring Your Own Container | Full control β build Docker image | Custom frameworks, proprietary libraries, complex dependencies |
SageMaker Pipelines is a native CI/CD system for ML. It defines end-to-end ML workflows as code β reproducible, auditable, and automated.
Pipeline Steps
- Processing (data transformation)
- Training (model training)
- Tuning (hyperparameter optimization)
- Model evaluation (quality gates)
- Register model (model registry)
- Deploy (create endpoint)
Benefits
- Version-controlled ML workflows
- Automated retraining on schedule or trigger
- Quality gates β only deploy if metrics pass
- Full lineage tracking
- Integrates with EventBridge for event-driven ML
Ground Truth
- Managed data labeling service
- Human labelers + ML-assisted labeling
- Image, text, video, 3D point cloud
- Active learning reduces labeling cost by up to 70%
Data Wrangler
- Visual data preparation (no code)
- 300+ built-in transformations
- Connect to S3, Redshift, Athena, Lake Formation
- Export to SageMaker Pipelines
Feature Store
- Centralized feature repository
- Online store (low-latency inference)
- Offline store (batch training)
- Feature versioning and sharing across teams
SageMaker Studio
- Web-based IDE for ML
- Integrated Jupyter notebooks
- Visual experiment tracking
- Access to all SageMaker tools from one interface
- Collaborative β share notebooks and results
Experiments
- Track every training run automatically
- Compare metrics: accuracy, loss, F1
- Reproduce results with full lineage
- Organize into trials and trial components
- Integrates with model registry
A centralized catalog for trained models:
- Version models with metadata (metrics, lineage, approval status)
- Approval workflows β models must be approved before deployment
- Deploy any registered version to any endpoint
- Track which model version is serving production traffic
SageMaker's components work together as a pipeline β from notebooks to production endpoints
Training Deep Dive
SageMaker training is fundamentally different from running training on your own EC2 instances:
DIY Training (EC2)
- Provision GPU instances manually
- Install drivers, CUDA, frameworks
- Pay for idle time between experiments
- Manage distributed training yourself
- No automatic experiment tracking
SageMaker Training
- Specify instance type and count β infra provisioned automatically
- Pre-built containers with all dependencies
- Pay only for training duration (seconds)
- Built-in distributed training (data/model parallel)
- Automatic metric logging and experiment tracking
| Instance | GPU | Best For |
|---|---|---|
| ml.m5.xlarge | None (CPU) | Simple algorithms (XGBoost, Linear Learner, sklearn) |
| ml.p3.2xlarge | 1Γ V100 (16 GB) | Single-GPU deep learning (text, images) |
| ml.p3.8xlarge | 4Γ V100 (64 GB) | Multi-GPU training, large models |
| ml.p3.16xlarge | 8Γ V100 (128 GB) | Distributed training, computer vision |
| ml.p4d.24xlarge | 8Γ A100 (320 GB) | Large language models, foundation model fine-tuning |
| ml.trn1.32xlarge | 16Γ Trainium chips | Cost-optimized deep learning on AWS custom silicon |
SageMaker supports two strategies for training that won't fit on a single GPU:
Data Parallelism
- Split training data across multiple GPUs
- Each GPU has full model copy
- Gradients synchronized after each step
- Use when: model fits in one GPU, data is large
- Near-linear scaling up to 256 GPUs
Model Parallelism
- Split model layers across multiple GPUs
- Each GPU holds part of the model
- Pipeline parallel execution
- Use when: model too large for one GPU (LLMs)
- Supports 100B+ parameter models
SageMaker Automatic Model Tuning runs multiple training jobs with different hyperparameters and finds the best combination:
- Bayesian optimization β intelligent search (not random)
- Parallel jobs β run up to 10 training jobs simultaneously
- Early stopping β terminate poor-performing jobs early to save cost
- Warm start β reuse prior tuning results to converge faster
π Managed Spot Training uses EC2 Spot instances for training jobs β saving up to 90% compared to On-Demand. SageMaker handles checkpointing and automatic restart if interrupted.
SageMaker training is ephemeral β infrastructure spins up, trains, saves model to S3, and terminates
Deployment & Inference
SageMaker offers multiple ways to serve predictions depending on your latency, throughput, and cost requirements:
Real-Time Endpoints
- Always-on inference endpoints
- Millisecond latency
- Auto-scaling based on traffic
- Best for: APIs, user-facing predictions
Batch Transform
- Process large datasets offline
- No persistent endpoint needed
- Input/output from S3
- Best for: nightly scoring, bulk predictions
Serverless Inference
- Scale to zero when idle
- Cold start (seconds)
- Pay per invocation
- Best for: intermittent traffic, dev/test
| Feature | Real-Time | Batch Transform | Serverless | Async |
|---|---|---|---|---|
| Latency | Milliseconds | Minutesβhours | Seconds (cold start) | Secondsβminutes |
| Cost model | Per hour (always on) | Per second (job duration) | Per invocation | Per second |
| Scale to zero | No (min 1 instance) | Yes (job-based) | Yes | Yes |
| Max payload | 6 MB | Unlimited (S3) | 4 MB | 1 GB |
| Best for | Production APIs | Bulk scoring | Dev, low traffic | Large payloads (video, docs) |
Host thousands of models on a single endpoint to reduce cost:
Multi-Model Endpoint (MME)
- Thousands of models on one endpoint
- Models loaded/unloaded dynamically from S3
- Shared infrastructure β massive cost savings
- Best for: per-customer models, A/B testing at scale
Multi-Container Endpoint
- Up to 15 containers on one endpoint
- Serial (pipeline) or direct invocation
- Different frameworks in each container
- Best for: pre/post-processing pipelines
Continuously monitors deployed models for quality degradation:
| Monitor Type | What It Detects | How It Works |
|---|---|---|
| Data Quality | Input data drift | Compares live data distribution against training baseline |
| Model Quality | Accuracy degradation | Compares predictions to ground truth labels |
| Bias Drift | Fairness changes | Detects emerging bias in predictions over time |
| Feature Attribution | Explainability changes | Monitors SHAP values for feature importance drift |
π When model performance degrades: SageMaker Model Monitor generates CloudWatch alarms β trigger retraining pipeline β deploy updated model. This is the automated ML feedback loop.
SageMaker endpoints are managed, auto-scaling, and support A/B testing and model monitoring out of the box
Cost & Optimization
SageMaker pricing is based on what you use β each component has independent pricing:
| Component | Pricing | Optimization |
|---|---|---|
| Notebooks | Per hour (instance running) | Stop when not in use, use lifecycle configs |
| Training | Per second (training duration) | Use Spot Training (up to 90% off), right-size instances |
| Endpoints | Per hour (instance running) | Auto-scaling, serverless for low traffic, multi-model endpoints |
| Batch Transform | Per second (job duration) | Right-size instances, use for non-real-time |
| Storage | S3 standard pricing | Lifecycle policies for old model artifacts |
Training
- Managed Spot β up to 90% savings
- Right-size GPU instances (don't over-provision)
- Use early stopping in HPO
- Use SageMaker Debugger to detect issues early
- Pipe mode for large datasets (stream from S3)
Inference
- Multi-model endpoints β share infra across models
- Serverless β scale to zero for dev/test
- Auto-scaling β match capacity to demand
- Use Inference Recommender to find optimal instance
- Model compilation (Neo) for 2Γ throughput
Operations
- Stop notebooks when not in use (auto-shutdown)
- Delete unused endpoints
- Archive old model artifacts in S3 Glacier
- Use SageMaker Savings Plans
- Tag resources for cost allocation
SageMaker Neo compiles trained models to run up to 2Γ faster with no loss in accuracy:
- Optimizes models for specific hardware (CPU, GPU, edge devices)
- Reduces model size and latency
- Supports TensorFlow, PyTorch, MXNet, ONNX, XGBoost
- Deploys to cloud instances or edge devices (IoT Greengrass)
Inference endpoints dominate SageMaker cost β auto-scaling and Spot training are the biggest levers
Architecture Patterns
Architecture
- SageMaker real-time endpoint
- API Gateway + Lambda β invoke endpoint
- Single model, auto-scaling
When to Use
- Single ML model in production
- Low-latency API required
- Simple request/response
For real-time predictions that need up-to-date features:
- Feature Store (online) β low-latency feature retrieval at inference time
- Kinesis + Lambda β stream events into Feature Store in real-time
- SageMaker Endpoint β fetches features from online store, makes prediction
- Example: fraud detection at checkout β need latest transaction history at prediction time
| Use Case | Best Service | Why |
|---|---|---|
| Custom ML model (train + deploy) | SageMaker | Full control over algorithm, data, and infrastructure |
| Pre-trained AI (no custom training) | Rekognition, Comprehend, Textract | No ML expertise needed β API call |
| Foundation models / generative AI | Amazon Bedrock | Access to Claude, Titan, Llama without managing infra |
| AutoML (no code) | SageMaker Autopilot | Automatic model selection and tuning |
| Simple tabular predictions | SageMaker Canvas | No-code ML for business analysts |
Network & Data
- Run training and endpoints in VPC (private subnets)
- Enable encryption at rest (KMS) for all data
- Enable encryption in transit (TLS) for endpoints
- Use S3 bucket policies to restrict data access
- Enable VPC endpoints for SageMaker API (no internet)
Identity & Governance
- Use IAM roles (not access keys) for notebooks and training
- Apply least-privilege policies per team/project
- Enable CloudTrail for API audit logging
- Use SageMaker Projects for team-based access control
- Enable model lineage for compliance and reproducibility
| Mistake | Why It's Bad | Fix |
|---|---|---|
| Using SageMaker for pre-trained AI tasks | Overkill β higher cost and effort | Use Rekognition, Comprehend, Bedrock |
| Leaving endpoints running with no traffic | Endpoints are the #1 cost β idle = waste | Use serverless or delete unused endpoints |
| Not using Spot for training | Paying 10Γ more than necessary | Enable Managed Spot Training with checkpointing |
| No model monitoring | Models degrade silently β bad predictions | Enable Model Monitor + automated retraining |
| Training on notebook instances | Expensive, no scaling, blocks notebook | Use SageMaker Training Jobs (separate compute) |
SageMaker shines for custom ML at scale β use managed AI services for standard tasks, SageMaker for everything custom