Amazon Bedrock โ
Foundation Models as a Service
Access leading foundation models through a single API. Build generative AI applications โ chatbots, content generation, summarization, code assistants โ without managing any ML infrastructure.
โก Bedrock in 30 Seconds
- Serverless access to foundation models โ Claude, Titan, Llama, Mistral, Stable Diffusion
- No infrastructure to manage โ pay per token (input/output)
- Built-in RAG with Knowledge Bases (connect your documents, get grounded answers)
- Agents that can reason, plan, and execute multi-step tasks using tools
- Guardrails for responsible AI โ content filtering, PII redaction, topic blocking
What is Bedrock
Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI companies available through a unified API. You don't train models, manage GPUs, or handle infrastructure โ you simply call an API and get intelligent responses.
๐ Think of Bedrock as: An AI vending machine โ choose a model, send a prompt, get a response. No ML expertise required.
Bedrock is the fastest way to add generative AI to your applications. It abstracts away all the complexity of hosting large language models โ you focus on your application logic and prompts.
Self-Hosting LLMs
- Need expensive GPU instances (p4d/p5 = $30+/hr)
- Complex model deployment and optimization
- Version management and updates
- Scaling under load is your problem
- Security, compliance โ all on you
Bedrock Solves
- Serverless โ no instances to manage
- Multiple models via single API
- AWS handles scaling, patching, updates
- Built-in security (VPC, encryption, IAM)
- Pay per token โ no idle cost
| Feature | Amazon Bedrock | Amazon SageMaker |
|---|---|---|
| Purpose | Use pre-trained foundation models | Train and deploy custom ML models |
| Infrastructure | Serverless โ fully managed | Managed instances โ you choose type/size |
| ML expertise | Not required โ prompt engineering | Required โ data science skills |
| Customization | Fine-tuning, RAG, prompt engineering | Full control โ any algorithm, any data |
| Cost model | Per token (input + output) | Per hour (instance runtime) |
| Best for | Generative AI apps (text, image, code) | Custom ML (fraud, forecasting, recommendation) |
Bedrock gives you serverless access to the world's best foundation models โ no ML expertise or infrastructure needed
Foundation Models
Bedrock provides access to foundation models from multiple providers โ choose based on capability, speed, and cost:
| Provider | Model | Strengths | Best For |
|---|---|---|---|
| Anthropic | Claude 3.5 Sonnet, Claude 3 Opus/Haiku | Reasoning, instruction following, safety | Complex tasks, analysis, coding, long context |
| Amazon | Titan Text, Titan Embeddings, Titan Image | AWS-native, cost-effective, embeddings | Embeddings for RAG, simple generation |
| Meta | Llama 3.1 (8B, 70B, 405B) | Open-weight, strong performance/cost | General-purpose, customization-friendly |
| Mistral AI | Mistral Large, Mixtral | Fast, multilingual, code | Low-latency apps, European compliance |
| Cohere | Command R, Embed | Enterprise search, RAG-optimized | Enterprise search, multilingual RAG |
| Stability AI | Stable Diffusion XL | Image generation | Marketing visuals, product images |
Best Quality
- Claude 3 Opus โ highest reasoning
- Complex analysis and research
- Long documents (200K context)
- Highest cost per token
Best Balance
- Claude 3.5 Sonnet โ quality + speed
- Production workloads
- Code generation, chat, RAG
- Best quality/cost ratio
Best Speed/Cost
- Claude 3 Haiku or Mistral
- High-throughput classification
- Simple extraction, routing
- Lowest latency and cost
Bedrock offers two approaches to customize models for your domain:
| Method | How It Works | When to Use |
|---|---|---|
| Fine-Tuning | Train on your labeled data (prompt/completion pairs) | Domain-specific tone, format, or knowledge |
| Continued Pre-Training | Train on unlabeled domain text (raw documents) | Teach model new domain vocabulary/concepts |
๐ Before fine-tuning, try RAG first. Most use cases are better solved with Knowledge Bases (retrieval-augmented generation) than fine-tuning. Fine-tune only when you need a specific output format or tone that prompting cannot achieve.
Choose your model based on the task: Opus for quality, Sonnet for balance, Haiku for speed โ and always try RAG before fine-tuning
Core Features
The fundamental Bedrock operation โ send a prompt, get a response:
- InvokeModel โ synchronous, single response (most common)
- InvokeModelWithResponseStream โ streaming tokens as they generate (chat UIs)
- Converse API โ unified multi-turn conversation API across all models
Converse API (Recommended)
- Unified interface across all models
- Handles message formatting differences
- Multi-turn conversation support
- Tool use / function calling built-in
- Switch models without code changes
InvokeModel (Direct)
- Model-specific request format
- Maximum control over parameters
- Slightly lower overhead
- Use for: batch processing, specific model features
- Different payload per model provider
| Feature | On-Demand | Provisioned Throughput |
|---|---|---|
| Pricing | Per token (input + output) | Per hour (reserved capacity) |
| Throttling | Shared limits โ can be throttled | Guaranteed โ your dedicated capacity |
| Latency | Variable under load | Consistent โ no noisy neighbors |
| Best for | Dev, variable traffic, experimentation | Production with SLAs, high-volume |
Guardrails apply safety controls across any model in Bedrock โ a centralized policy layer for responsible AI:
Content Filters
- Block hate, violence, sexual, insults
- Configurable thresholds (low/medium/high)
- Applied to both input and output
PII Redaction
- Detect and mask PII in responses
- SSN, email, phone, credit card
- Block or anonymize
Topic Blocking
- Deny specific topics (competitors, politics)
- Custom denied topics with examples
- Word/phrase filters
๐ Guardrails are independent of the model. You define them once and apply across all models and applications โ so you can switch models without rebuilding safety controls.
Bedrock Converse API unifies all models โ Guardrails add safety without changing application code
RAG & Knowledge Bases
Retrieval-Augmented Generation (RAG) is the technique of grounding AI responses in your actual data. Instead of relying solely on the model's training data, you retrieve relevant documents and include them in the prompt.
Without RAG
- Model hallucinates answers
- No access to your private data
- Knowledge cutoff date
- Can't cite sources
With RAG
- Grounded in your actual documents
- Access to private/current data
- Always up-to-date
- Citable โ shows source documents
Bedrock Knowledge Bases is a fully managed RAG service โ connect your data sources, Bedrock handles chunking, embedding, storage, and retrieval automatically.
Fixed-Size
- Split at N tokens/characters
- Simple and predictable
- May break mid-sentence
- Good for: uniform documents
Semantic
- Split by meaning boundaries
- Preserves context within chunks
- Uses embeddings to find breaks
- Good for: varied document types
Hierarchical
- Parent-child chunk relationships
- Retrieve small, return large context
- Best recall with full context
- Good for: complex documents
Knowledge Bases turn Bedrock from a generic AI into a grounded expert on your data โ with citations
Agents & Orchestration
Bedrock Agents are autonomous AI systems that can reason, plan, and execute multi-step tasks:
- Break complex tasks into steps
- Call external APIs (action groups) to get data or perform actions
- Query knowledge bases for information
- Maintain conversation state across turns
- Handle errors and retry with different approaches
Agents transform Bedrock from a text generator into an autonomous system that can reason and act
Cost & Security
Bedrock uses token-based pricing โ you pay for what you use:
| Model | Input (per 1K tokens) | Output (per 1K tokens) | Notes |
|---|---|---|---|
| Claude 3 Haiku | $0.00025 | $0.00125 | Cheapest โ high volume tasks |
| Claude 3.5 Sonnet | $0.003 | $0.015 | Best value for production |
| Claude 3 Opus | $0.015 | $0.075 | Premium โ complex reasoning |
| Llama 3.1 70B | $0.00099 | $0.00099 | Open model, good price |
| Titan Text Express | $0.0002 | $0.0006 | AWS native, lowest cost |
| Titan Embeddings | $0.0001 | โ | For RAG vector generation |
Model Selection
- Use Haiku for classification/routing
- Use Sonnet for complex generation
- Reserve Opus for highest-quality needs
- Use Titan Embeddings for RAG
Token Management
- Keep prompts concise
- Set max_tokens to limit output
- Use smaller context when possible
- Cache frequent prompts
Architecture
- Route simple queries to Haiku
- Route complex queries to Sonnet/Opus
- Use Provisioned Throughput for steady traffic
- Batch APIs for non-real-time (50% off)
Data Privacy
- Your data is never used to train models
- Data encrypted at rest (AWS KMS) and in transit (TLS 1.2+)
- VPC endpoints โ no internet traversal
- Data stays in your AWS region
- CloudTrail logs all API calls
Access Control
- IAM policies for model access
- Resource-based policies on models
- Guardrails for content safety
- Model access must be explicitly enabled per account
- SOC 2, HIPAA eligible, ISO 27001
๐ Critical for enterprise adoption: Bedrock data is never shared with model providers and never used for training. This is the #1 differentiator vs using model providers directly.
Smart model routing can reduce Bedrock costs by 90%+ โ and your data is never used for model training
Architecture Patterns
Architecture
- API Gateway โ Lambda โ Bedrock
- Converse API with streaming
- DynamoDB for conversation history
- Guardrails for safety
When to Use
- Customer-facing chatbot
- Internal Q&A assistant
- Simple query/response patterns
- No private data needed in answers
Full enterprise architecture with private data grounding:
- API Gateway + Lambda + Cognito โ authenticated API layer
- Bedrock Knowledge Base โ RAG over S3 documents (PDFs, policies, manuals)
- OpenSearch Serverless โ vector store for 1M+ document chunks
- Guardrails โ PII redaction + topic blocking for compliance
- DynamoDB โ conversation history for multi-turn context
- CloudWatch โ latency, token usage, error metrics
An agent that can perform actions on behalf of the user:
- Customer support agent โ look up orders, process refunds, update accounts
- Research agent โ search documents, summarize findings, create reports
- DevOps agent โ check service health, read CloudWatch logs, trigger remediation
| Use Case | Best Service | Why |
|---|---|---|
| Generative AI (text, chat, code) | Bedrock | Serverless, multiple models, built-in RAG |
| Custom ML model (train from scratch) | SageMaker | Full training control, custom algorithms |
| Image recognition (pre-trained) | Rekognition | No ML needed, simple API |
| Text analysis (sentiment, entities) | Comprehend | Pre-built NLP, no model selection |
| Self-hosted open models | SageMaker endpoints | Custom containers, dedicated GPUs |
| Mistake | Why It's Bad | Fix |
|---|---|---|
| Using Opus for everything | 5-10x more expensive than needed | Route by complexity โ Haiku for simple, Sonnet for complex |
| Fine-tuning before trying RAG | Expensive, slow, RAG works better | Start with Knowledge Bases first |
| No guardrails in production | Unfiltered content to users | Always enable Guardrails โ PII + content filters |
| Ignoring token costs | Long system prompts = waste | Use Prompt Caching, keep prompts concise |
| Not using VPC endpoints | Data traverses public internet | Enable VPC endpoints for Bedrock API calls |
Bedrock shines for generative AI โ combine Knowledge Bases + Agents + Guardrails for production-ready AI