Amazon Bedrock
LearningTree ยท AWS ยท AI & ML

Amazon Bedrock โ€”
Foundation Models as a Service

Access leading foundation models through a single API. Build generative AI applications โ€” chatbots, content generation, summarization, code assistants โ€” without managing any ML infrastructure.

โšก Bedrock in 30 Seconds

  • Serverless access to foundation models โ€” Claude, Titan, Llama, Mistral, Stable Diffusion
  • No infrastructure to manage โ€” pay per token (input/output)
  • Built-in RAG with Knowledge Bases (connect your documents, get grounded answers)
  • Agents that can reason, plan, and execute multi-step tasks using tools
  • Guardrails for responsible AI โ€” content filtering, PII redaction, topic blocking
01
Chapter One

What is Bedrock

Introduction Introductory

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI companies available through a unified API. You don't train models, manage GPUs, or handle infrastructure โ€” you simply call an API and get intelligent responses.

๐Ÿ‘‰ Think of Bedrock as: An AI vending machine โ€” choose a model, send a prompt, get a response. No ML expertise required.

Bedrock is the fastest way to add generative AI to your applications. It abstracts away all the complexity of hosting large language models โ€” you focus on your application logic and prompts.

Why Bedrock Exists Introductory
โš ๏ธ

Self-Hosting LLMs

  • Need expensive GPU instances (p4d/p5 = $30+/hr)
  • Complex model deployment and optimization
  • Version management and updates
  • Scaling under load is your problem
  • Security, compliance โ€” all on you
โœ…

Bedrock Solves

  • Serverless โ€” no instances to manage
  • Multiple models via single API
  • AWS handles scaling, patching, updates
  • Built-in security (VPC, encryption, IAM)
  • Pay per token โ€” no idle cost
Bedrock vs SageMaker Core
FeatureAmazon BedrockAmazon SageMaker
PurposeUse pre-trained foundation modelsTrain and deploy custom ML models
InfrastructureServerless โ€” fully managedManaged instances โ€” you choose type/size
ML expertiseNot required โ€” prompt engineeringRequired โ€” data science skills
CustomizationFine-tuning, RAG, prompt engineeringFull control โ€” any algorithm, any data
Cost modelPer token (input + output)Per hour (instance runtime)
Best forGenerative AI apps (text, image, code)Custom ML (fraud, forecasting, recommendation)
Concept Diagram Introductory
Amazon Bedrock โ€” Application Connects to Foundation Models
๐Ÿ“ฑ YOUR APP AWS CLOUD AMAZON BEDROCK CLAUDE Anthropic TITAN Amazon LLAMA Meta MISTRAL Mistral AI KNOWLEDGE BASES RAG over your data AGENTS Multi-step reasoning GUARDRAILS Safety & compliance
๐Ÿ‘‰ Key Takeaway

Bedrock gives you serverless access to the world's best foundation models โ€” no ML expertise or infrastructure needed

02
Chapter Two

Foundation Models

Available Models Core

Bedrock provides access to foundation models from multiple providers โ€” choose based on capability, speed, and cost:

ProviderModelStrengthsBest For
AnthropicClaude 3.5 Sonnet, Claude 3 Opus/HaikuReasoning, instruction following, safetyComplex tasks, analysis, coding, long context
AmazonTitan Text, Titan Embeddings, Titan ImageAWS-native, cost-effective, embeddingsEmbeddings for RAG, simple generation
MetaLlama 3.1 (8B, 70B, 405B)Open-weight, strong performance/costGeneral-purpose, customization-friendly
Mistral AIMistral Large, MixtralFast, multilingual, codeLow-latency apps, European compliance
CohereCommand R, EmbedEnterprise search, RAG-optimizedEnterprise search, multilingual RAG
Stability AIStable Diffusion XLImage generationMarketing visuals, product images
Model Selection Guide Core
๐Ÿง 

Best Quality

  • Claude 3 Opus โ€” highest reasoning
  • Complex analysis and research
  • Long documents (200K context)
  • Highest cost per token
โš–๏ธ

Best Balance

  • Claude 3.5 Sonnet โ€” quality + speed
  • Production workloads
  • Code generation, chat, RAG
  • Best quality/cost ratio
โšก

Best Speed/Cost

  • Claude 3 Haiku or Mistral
  • High-throughput classification
  • Simple extraction, routing
  • Lowest latency and cost
Model Customization In-Depth

Bedrock offers two approaches to customize models for your domain:

MethodHow It WorksWhen to Use
Fine-TuningTrain on your labeled data (prompt/completion pairs)Domain-specific tone, format, or knowledge
Continued Pre-TrainingTrain on unlabeled domain text (raw documents)Teach model new domain vocabulary/concepts

๐Ÿ‘‰ Before fine-tuning, try RAG first. Most use cases are better solved with Knowledge Bases (retrieval-augmented generation) than fine-tuning. Fine-tune only when you need a specific output format or tone that prompting cannot achieve.

๐Ÿ‘‰ Key Takeaway

Choose your model based on the task: Opus for quality, Sonnet for balance, Haiku for speed โ€” and always try RAG before fine-tuning

03
Chapter Three

Core Features

Inference APIs Core

The fundamental Bedrock operation โ€” send a prompt, get a response:

  • InvokeModel โ€” synchronous, single response (most common)
  • InvokeModelWithResponseStream โ€” streaming tokens as they generate (chat UIs)
  • Converse API โ€” unified multi-turn conversation API across all models
๐Ÿ“ก

Converse API (Recommended)

  • Unified interface across all models
  • Handles message formatting differences
  • Multi-turn conversation support
  • Tool use / function calling built-in
  • Switch models without code changes
โšก

InvokeModel (Direct)

  • Model-specific request format
  • Maximum control over parameters
  • Slightly lower overhead
  • Use for: batch processing, specific model features
  • Different payload per model provider
Provisioned Throughput In-Depth
FeatureOn-DemandProvisioned Throughput
PricingPer token (input + output)Per hour (reserved capacity)
ThrottlingShared limits โ€” can be throttledGuaranteed โ€” your dedicated capacity
LatencyVariable under loadConsistent โ€” no noisy neighbors
Best forDev, variable traffic, experimentationProduction with SLAs, high-volume
Guardrails Core

Guardrails apply safety controls across any model in Bedrock โ€” a centralized policy layer for responsible AI:

๐Ÿšซ

Content Filters

  • Block hate, violence, sexual, insults
  • Configurable thresholds (low/medium/high)
  • Applied to both input and output
๐Ÿ”’

PII Redaction

  • Detect and mask PII in responses
  • SSN, email, phone, credit card
  • Block or anonymize
๐ŸŽฏ

Topic Blocking

  • Deny specific topics (competitors, politics)
  • Custom denied topics with examples
  • Word/phrase filters

๐Ÿ‘‰ Guardrails are independent of the model. You define them once and apply across all models and applications โ€” so you can switch models without rebuilding safety controls.

๐Ÿ‘‰ Key Takeaway

Bedrock Converse API unifies all models โ€” Guardrails add safety without changing application code

04
Chapter Four

RAG & Knowledge Bases

What is RAG Core

Retrieval-Augmented Generation (RAG) is the technique of grounding AI responses in your actual data. Instead of relying solely on the model's training data, you retrieve relevant documents and include them in the prompt.

โš ๏ธ

Without RAG

  • Model hallucinates answers
  • No access to your private data
  • Knowledge cutoff date
  • Can't cite sources
โœ…

With RAG

  • Grounded in your actual documents
  • Access to private/current data
  • Always up-to-date
  • Citable โ€” shows source documents
Knowledge Bases In-Depth

Bedrock Knowledge Bases is a fully managed RAG service โ€” connect your data sources, Bedrock handles chunking, embedding, storage, and retrieval automatically.

Bedrock Knowledge Base โ€” RAG Architecture
DATA SOURCES S3 (PDFs) Confluence Web Crawler INGESTION 1. CHUNK 2. EMBED (Titan) 3. STORE VECTORS RETRIEVAL 1. EMBED QUERY 2. VECTOR SEARCH 3. AUGMENT PROMPT GENERATION CLAUDE / TITAN GROUNDED ANSWER + CITATIONS Data Sources โ†’ Chunk โ†’ Embed โ†’ Store โ†’ Query โ†’ Retrieve โ†’ Generate โ†’ Cite
Chunking Strategies In-Depth
๐Ÿ“„

Fixed-Size

  • Split at N tokens/characters
  • Simple and predictable
  • May break mid-sentence
  • Good for: uniform documents
๐Ÿ“‘

Semantic

  • Split by meaning boundaries
  • Preserves context within chunks
  • Uses embeddings to find breaks
  • Good for: varied document types
๐Ÿ—๏ธ

Hierarchical

  • Parent-child chunk relationships
  • Retrieve small, return large context
  • Best recall with full context
  • Good for: complex documents
๐Ÿ‘‰ Key Takeaway

Knowledge Bases turn Bedrock from a generic AI into a grounded expert on your data โ€” with citations

05
Chapter Five

Agents & Orchestration

What Are Agents Core

Bedrock Agents are autonomous AI systems that can reason, plan, and execute multi-step tasks:

  • Break complex tasks into steps
  • Call external APIs (action groups) to get data or perform actions
  • Query knowledge bases for information
  • Maintain conversation state across turns
  • Handle errors and retry with different approaches
Agent Architecture In-Depth
Bedrock Agent โ€” Reasoning & Action Loop
๐Ÿ‘ค USER BEDROCK AGENT 1. REASONING (LLM) 2. PLAN NEXT ACTION 3. EXECUTE & OBSERVE Loop until complete ACTION GROUPS Lambda / APIs KNOWLEDGE BASES Your documents EXTERNAL SYSTEMS ๐Ÿ—„๏ธ Databases ๐Ÿ“ง Email / CRM ๐Ÿ›’ Order Systems ๐Ÿ”ง Internal APIs Agent: Reason โ†’ Plan โ†’ Act โ†’ Observe โ†’ Repeat until complete
๐Ÿ‘‰ Key Takeaway

Agents transform Bedrock from a text generator into an autonomous system that can reason and act

06
Chapter Six

Cost & Security

Pricing Model Core

Bedrock uses token-based pricing โ€” you pay for what you use:

ModelInput (per 1K tokens)Output (per 1K tokens)Notes
Claude 3 Haiku$0.00025$0.00125Cheapest โ€” high volume tasks
Claude 3.5 Sonnet$0.003$0.015Best value for production
Claude 3 Opus$0.015$0.075Premium โ€” complex reasoning
Llama 3.1 70B$0.00099$0.00099Open model, good price
Titan Text Express$0.0002$0.0006AWS native, lowest cost
Titan Embeddings$0.0001โ€”For RAG vector generation
Cost Optimization In-Depth
๐Ÿ’ฐ

Model Selection

  • Use Haiku for classification/routing
  • Use Sonnet for complex generation
  • Reserve Opus for highest-quality needs
  • Use Titan Embeddings for RAG
๐Ÿ“

Token Management

  • Keep prompts concise
  • Set max_tokens to limit output
  • Use smaller context when possible
  • Cache frequent prompts
๐Ÿ”ง

Architecture

  • Route simple queries to Haiku
  • Route complex queries to Sonnet/Opus
  • Use Provisioned Throughput for steady traffic
  • Batch APIs for non-real-time (50% off)
Security & Compliance Core
๐Ÿ”’

Data Privacy

  • Your data is never used to train models
  • Data encrypted at rest (AWS KMS) and in transit (TLS 1.2+)
  • VPC endpoints โ€” no internet traversal
  • Data stays in your AWS region
  • CloudTrail logs all API calls
๐Ÿ›ก๏ธ

Access Control

  • IAM policies for model access
  • Resource-based policies on models
  • Guardrails for content safety
  • Model access must be explicitly enabled per account
  • SOC 2, HIPAA eligible, ISO 27001

๐Ÿ‘‰ Critical for enterprise adoption: Bedrock data is never shared with model providers and never used for training. This is the #1 differentiator vs using model providers directly.

๐Ÿ‘‰ Key Takeaway

Smart model routing can reduce Bedrock costs by 90%+ โ€” and your data is never used for model training

07
Chapter Seven

Architecture Patterns

Pattern 1 โ€” Simple Chat API Introductory
๐Ÿ’ฌ

Architecture

  • API Gateway โ†’ Lambda โ†’ Bedrock
  • Converse API with streaming
  • DynamoDB for conversation history
  • Guardrails for safety
โœ…

When to Use

  • Customer-facing chatbot
  • Internal Q&A assistant
  • Simple query/response patterns
  • No private data needed in answers
Pattern 2 โ€” Enterprise RAG In-Depth

Full enterprise architecture with private data grounding:

  • API Gateway + Lambda + Cognito โ€” authenticated API layer
  • Bedrock Knowledge Base โ€” RAG over S3 documents (PDFs, policies, manuals)
  • OpenSearch Serverless โ€” vector store for 1M+ document chunks
  • Guardrails โ€” PII redaction + topic blocking for compliance
  • DynamoDB โ€” conversation history for multi-turn context
  • CloudWatch โ€” latency, token usage, error metrics
Pattern 3 โ€” Autonomous Agent In-Depth

An agent that can perform actions on behalf of the user:

  • Customer support agent โ€” look up orders, process refunds, update accounts
  • Research agent โ€” search documents, summarize findings, create reports
  • DevOps agent โ€” check service health, read CloudWatch logs, trigger remediation
When to Use Bedrock vs Alternatives Core
Use CaseBest ServiceWhy
Generative AI (text, chat, code)BedrockServerless, multiple models, built-in RAG
Custom ML model (train from scratch)SageMakerFull training control, custom algorithms
Image recognition (pre-trained)RekognitionNo ML needed, simple API
Text analysis (sentiment, entities)ComprehendPre-built NLP, no model selection
Self-hosted open modelsSageMaker endpointsCustom containers, dedicated GPUs
Common Mistakes Introductory
MistakeWhy It's BadFix
Using Opus for everything5-10x more expensive than neededRoute by complexity โ€” Haiku for simple, Sonnet for complex
Fine-tuning before trying RAGExpensive, slow, RAG works betterStart with Knowledge Bases first
No guardrails in productionUnfiltered content to usersAlways enable Guardrails โ€” PII + content filters
Ignoring token costsLong system prompts = wasteUse Prompt Caching, keep prompts concise
Not using VPC endpointsData traverses public internetEnable VPC endpoints for Bedrock API calls
๐Ÿ‘‰ Key Takeaway

Bedrock shines for generative AI โ€” combine Knowledge Bases + Agents + Guardrails for production-ready AI