LearningTree · AWS · AI & ML

Amazon Bedrock —
Foundation Models as a Service

Access leading foundation models through a single API. Build generative AI applications — chatbots, content generation, summarization, code assistants — without managing any ML infrastructure.

⚡ Bedrock in 30 Seconds

Serverless access to foundation models — Claude, Titan, Llama, Mistral, Stable Diffusion
No infrastructure to manage — pay per token (input/output)
Built-in RAG with Knowledge Bases (connect your documents, get grounded answers)
Agents that can reason, plan, and execute multi-step tasks using tools
Guardrails for responsible AI — content filtering, PII redaction, topic blocking

Chapter One

What is Bedrock

Introduction Introductory

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI companies available through a unified API. You don't train models, manage GPUs, or handle infrastructure — you simply call an API and get intelligent responses.

👉 Think of Bedrock as: An AI vending machine — choose a model, send a prompt, get a response. No ML expertise required.

Bedrock is the fastest way to add generative AI to your applications. It abstracts away all the complexity of hosting large language models — you focus on your application logic and prompts.

Why Bedrock Exists Introductory

⚠️

Self-Hosting LLMs

Need expensive GPU instances (p4d/p5 = $30+/hr)
Complex model deployment and optimization
Version management and updates
Scaling under load is your problem
Security, compliance — all on you

✅

Bedrock Solves

Serverless — no instances to manage
Multiple models via single API
AWS handles scaling, patching, updates
Built-in security (VPC, encryption, IAM)
Pay per token — no idle cost

Bedrock vs SageMaker Core

Feature	Amazon Bedrock	Amazon SageMaker
Purpose	Use pre-trained foundation models	Train and deploy custom ML models
Infrastructure	Serverless — fully managed	Managed instances — you choose type/size
ML expertise	Not required — prompt engineering	Required — data science skills
Customization	Fine-tuning, RAG, prompt engineering	Full control — any algorithm, any data
Cost model	Per token (input + output)	Per hour (instance runtime)
Best for	Generative AI apps (text, image, code)	Custom ML (fraud, forecasting, recommendation)

Concept Diagram Introductory

Amazon Bedrock — Application Connects to Foundation Models

👉 Key Takeaway

Bedrock gives you serverless access to the world's best foundation models — no ML expertise or infrastructure needed

Chapter Two

Foundation Models

Available Models Core

Bedrock provides access to foundation models from multiple providers — choose based on capability, speed, and cost:

Provider	Model	Strengths	Best For
Anthropic	Claude 3.5 Sonnet, Claude 3 Opus/Haiku	Reasoning, instruction following, safety	Complex tasks, analysis, coding, long context
Amazon	Titan Text, Titan Embeddings, Titan Image	AWS-native, cost-effective, embeddings	Embeddings for RAG, simple generation
Meta	Llama 3.1 (8B, 70B, 405B)	Open-weight, strong performance/cost	General-purpose, customization-friendly
Mistral AI	Mistral Large, Mixtral	Fast, multilingual, code	Low-latency apps, European compliance
Cohere	Command R, Embed	Enterprise search, RAG-optimized	Enterprise search, multilingual RAG
Stability AI	Stable Diffusion XL	Image generation	Marketing visuals, product images

Model Selection Guide Core

🧠

Best Quality

Claude 3 Opus — highest reasoning
Complex analysis and research
Long documents (200K context)
Highest cost per token

⚖️

Best Balance

Claude 3.5 Sonnet — quality + speed
Production workloads
Code generation, chat, RAG
Best quality/cost ratio

⚡

Best Speed/Cost

Claude 3 Haiku or Mistral
High-throughput classification
Simple extraction, routing
Lowest latency and cost

Model Customization In-Depth

Bedrock offers two approaches to customize models for your domain:

Method	How It Works	When to Use
Fine-Tuning	Train on your labeled data (prompt/completion pairs)	Domain-specific tone, format, or knowledge
Continued Pre-Training	Train on unlabeled domain text (raw documents)	Teach model new domain vocabulary/concepts

👉 Before fine-tuning, try RAG first. Most use cases are better solved with Knowledge Bases (retrieval-augmented generation) than fine-tuning. Fine-tune only when you need a specific output format or tone that prompting cannot achieve.

👉 Key Takeaway

Choose your model based on the task: Opus for quality, Sonnet for balance, Haiku for speed — and always try RAG before fine-tuning

Chapter Three

Core Features

Inference APIs Core

The fundamental Bedrock operation — send a prompt, get a response:

InvokeModel — synchronous, single response (most common)
InvokeModelWithResponseStream — streaming tokens as they generate (chat UIs)
Converse API — unified multi-turn conversation API across all models

📡

Converse API (Recommended)

Unified interface across all models
Handles message formatting differences
Multi-turn conversation support
Tool use / function calling built-in
Switch models without code changes

⚡

InvokeModel (Direct)

Model-specific request format
Maximum control over parameters
Slightly lower overhead
Use for: batch processing, specific model features
Different payload per model provider

Provisioned Throughput In-Depth

Feature	On-Demand	Provisioned Throughput
Pricing	Per token (input + output)	Per hour (reserved capacity)
Throttling	Shared limits — can be throttled	Guaranteed — your dedicated capacity
Latency	Variable under load	Consistent — no noisy neighbors
Best for	Dev, variable traffic, experimentation	Production with SLAs, high-volume

Guardrails Core

Guardrails apply safety controls across any model in Bedrock — a centralized policy layer for responsible AI:

🚫

Content Filters

Block hate, violence, sexual, insults
Configurable thresholds (low/medium/high)
Applied to both input and output

🔒

PII Redaction

Detect and mask PII in responses
SSN, email, phone, credit card
Block or anonymize

🎯

Topic Blocking

Deny specific topics (competitors, politics)
Custom denied topics with examples
Word/phrase filters

👉 Guardrails are independent of the model. You define them once and apply across all models and applications — so you can switch models without rebuilding safety controls.

👉 Key Takeaway

Bedrock Converse API unifies all models — Guardrails add safety without changing application code

Chapter Four

RAG & Knowledge Bases

What is RAG Core

Retrieval-Augmented Generation (RAG) is the technique of grounding AI responses in your actual data. Instead of relying solely on the model's training data, you retrieve relevant documents and include them in the prompt.

⚠️

Without RAG

Model hallucinates answers
No access to your private data
Knowledge cutoff date
Can't cite sources

✅

With RAG

Grounded in your actual documents
Access to private/current data
Always up-to-date
Citable — shows source documents

Knowledge Bases In-Depth

Bedrock Knowledge Bases is a fully managed RAG service — connect your data sources, Bedrock handles chunking, embedding, storage, and retrieval automatically.

Bedrock Knowledge Base — RAG Architecture

Chunking Strategies In-Depth

📄

Fixed-Size

Split at N tokens/characters
Simple and predictable
May break mid-sentence
Good for: uniform documents

📑

Semantic

Split by meaning boundaries
Preserves context within chunks
Uses embeddings to find breaks
Good for: varied document types

🏗️

Hierarchical

Parent-child chunk relationships
Retrieve small, return large context
Best recall with full context
Good for: complex documents

👉 Key Takeaway

Knowledge Bases turn Bedrock from a generic AI into a grounded expert on your data — with citations

Chapter Five

Agents & Orchestration

What Are Agents Core

Bedrock Agents are autonomous AI systems that can reason, plan, and execute multi-step tasks:

Break complex tasks into steps
Call external APIs (action groups) to get data or perform actions
Query knowledge bases for information
Maintain conversation state across turns
Handle errors and retry with different approaches

Agent Architecture In-Depth

Bedrock Agent — Reasoning & Action Loop

👉 Key Takeaway

Agents transform Bedrock from a text generator into an autonomous system that can reason and act

Chapter Six

Cost & Security

Pricing Model Core

Bedrock uses token-based pricing — you pay for what you use:

Model	Input (per 1K tokens)	Output (per 1K tokens)	Notes
Claude 3 Haiku	$0.00025	$0.00125	Cheapest — high volume tasks
Claude 3.5 Sonnet	$0.003	$0.015	Best value for production
Claude 3 Opus	$0.015	$0.075	Premium — complex reasoning
Llama 3.1 70B	$0.00099	$0.00099	Open model, good price
Titan Text Express	$0.0002	$0.0006	AWS native, lowest cost
Titan Embeddings	$0.0001	—	For RAG vector generation

Cost Optimization In-Depth

💰

Model Selection

Use Haiku for classification/routing
Use Sonnet for complex generation
Reserve Opus for highest-quality needs
Use Titan Embeddings for RAG

📏

Token Management

Keep prompts concise
Set max_tokens to limit output
Use smaller context when possible
Cache frequent prompts

🔧

Architecture

Route simple queries to Haiku
Route complex queries to Sonnet/Opus
Use Provisioned Throughput for steady traffic
Batch APIs for non-real-time (50% off)

Security & Compliance Core

🔒

Data Privacy

Your data is never used to train models
Data encrypted at rest (AWS KMS) and in transit (TLS 1.2+)
VPC endpoints — no internet traversal
Data stays in your AWS region
CloudTrail logs all API calls

🛡️

Access Control

IAM policies for model access
Resource-based policies on models
Guardrails for content safety
Model access must be explicitly enabled per account
SOC 2, HIPAA eligible, ISO 27001

👉 Critical for enterprise adoption: Bedrock data is never shared with model providers and never used for training. This is the #1 differentiator vs using model providers directly.

👉 Key Takeaway

Smart model routing can reduce Bedrock costs by 90%+ — and your data is never used for model training

Chapter Seven

Architecture Patterns

Pattern 1 — Simple Chat API Introductory

💬

Architecture

API Gateway → Lambda → Bedrock
Converse API with streaming
DynamoDB for conversation history
Guardrails for safety

✅

When to Use

Customer-facing chatbot
Internal Q&A assistant
Simple query/response patterns
No private data needed in answers

Pattern 2 — Enterprise RAG In-Depth

Full enterprise architecture with private data grounding:

API Gateway + Lambda + Cognito — authenticated API layer
Bedrock Knowledge Base — RAG over S3 documents (PDFs, policies, manuals)
OpenSearch Serverless — vector store for 1M+ document chunks
Guardrails — PII redaction + topic blocking for compliance
DynamoDB — conversation history for multi-turn context
CloudWatch — latency, token usage, error metrics

Pattern 3 — Autonomous Agent In-Depth

An agent that can perform actions on behalf of the user:

Customer support agent — look up orders, process refunds, update accounts
Research agent — search documents, summarize findings, create reports
DevOps agent — check service health, read CloudWatch logs, trigger remediation

When to Use Bedrock vs Alternatives Core

Use Case	Best Service	Why
Generative AI (text, chat, code)	Bedrock	Serverless, multiple models, built-in RAG
Custom ML model (train from scratch)	SageMaker	Full training control, custom algorithms
Image recognition (pre-trained)	Rekognition	No ML needed, simple API
Text analysis (sentiment, entities)	Comprehend	Pre-built NLP, no model selection
Self-hosted open models	SageMaker endpoints	Custom containers, dedicated GPUs

Common Mistakes Introductory

Mistake	Why It's Bad	Fix
Using Opus for everything	5-10x more expensive than needed	Route by complexity — Haiku for simple, Sonnet for complex
Fine-tuning before trying RAG	Expensive, slow, RAG works better	Start with Knowledge Bases first
No guardrails in production	Unfiltered content to users	Always enable Guardrails — PII + content filters
Ignoring token costs	Long system prompts = waste	Use Prompt Caching, keep prompts concise
Not using VPC endpoints	Data traverses public internet	Enable VPC endpoints for Bedrock API calls

👉 Key Takeaway

Bedrock shines for generative AI — combine Knowledge Bases + Agents + Guardrails for production-ready AI