The Foundation covers AI breadth-first across 12 domains. Advanced goes depth-first โ each topic is a complete practitioner's guide with real examples, code, benchmarks, and production patterns.
11 chapters All Levels Practitioner
From zero-shot basics to production-grade prompt systems โ how LLMs actually process prompts,
chain-of-thought reasoning, structured outputs, injection defence, evaluation, and model-specific patterns.
01
How LLMs Process Prompts
Tokens, attention, probability โ why framing changes everything
Core 02
Zero-Shot, Few-Shot & Roles
When examples help, persona vs instruction, label balance
Core 03
Chain-of-Thought & Reasoning
CoT, Self-Consistency, Tree-of-Thoughts, Least-to-Most
In-depth 04
Structured Outputs & Format Control
JSON mode, schema prompting, Pydantic validation patterns
Core 05
System Prompts & Instruction Hierarchy
System vs user roles, tone locking, model-specific differences
In-depth 06
RAG Prompting Patterns
Context injection, citation prompting, lost-in-the-middle
Core 07
Prompt Injection & Security
Direct & indirect injection, leaking, real attacks & defences
In-depth 08
Evaluation & Regression Testing
LLM-as-judge, golden sets, promptfoo, LangSmith evals
Core 09
Model-Specific Patterns
GPT-4o, Claude XML tags, Gemini grounding, Llama 3 templates
Core 10
Production Prompt Engineering
Versioning, cost optimisation, A/B testing, incident response
In-depth 11
Prompt Workflows & Iteration Patterns
Multi-step chains, self-critique, self-consistency, tool-oriented prompting, reliability vs quality
Practical Systems 10 chapters All Levels Practitioner
Production RAG systems โ chunking strategies, embedding models, vector databases, retrieval optimization, hybrid search, re-ranking, and advanced patterns like GraphRAG and agentic RAG.
01
RAG Mental Model
What is RAG, when to use it vs fine-tuning, the 6-stage pipeline
Core 02
Data Ingestion & Chunking
Loaders, parsers, semantic vs fixed-size chunking strategies
Core 03
Embeddings & Representation
OpenAI, Cohere, BGE, MTEB benchmarks, dimensionality tradeoffs
In-depth 04
Vector Storage & Indexing
Pinecone, Qdrant, pgvector, HNSW indexing, scaling
In-depth 05
Retrieval Strategies
Dense, sparse, hybrid search, query expansion, HyDE
Core 06
Ranking & Re-Ranking
Cross-encoder re-rankers, Cohere Rerank, ColBERT, fusion
In-depth 07
Context Construction
Context window management, citation prompting, compression
Core 08
Failure Modes & Evaluation
Retrieval metrics, RAGAS, faithfulness, end-to-end eval
Core 09
Advanced RAG Patterns
CRAG, Self-RAG, GraphRAG, agentic RAG, multimodal
In-depth 10
Production Systems
Caching, latency, cost, observability, A/B testing
In-depth 10 chapters All Levels Practitioner
From demos to deployment โ building reliable, observable, cost-effective AI agents. Tool orchestration, memory systems, planning, multi-agent coordination, security, and production operations.
01
Agent Architecture
Components of a production agent โ the reasoning loop, tools, and state
Core 02
Tool Orchestration
Function calling at scale โ schemas, sandboxing, parallel tools, rate limits
Core 03
Memory Systems
Short-term, long-term, and episodic memory โ context management strategies
In-depth 04
Planning & Task Decomposition
Multi-step agent behavior โ ReAct, plan-then-execute, MCTS, re-planning
In-depth 05
Error Handling & Recovery
When agents fail โ retry logic, graceful degradation, 5-layer defence
Core 06
Multi-Agent Systems
Collaboration and orchestration โ supervisor, mesh, debate, shared memory
In-depth 07
Security & Guardrails
Protecting agent systems โ prompt injection, tool abuse, output guardrails
In-depth 08
Observability
Tracing, logging, and debugging โ LangSmith, Langfuse, OpenTelemetry
Core 09
Cost & Latency
Making agents affordable and fast โ token budgets, caching, model routing
Core 10
Deployment
Running agents in production โ scaling, versioning, A/B testing, incident response
In-depth 10 chapters All Levels Practitioner
Architecture patterns for LLM applications โ from single-model APIs to multi-model orchestration. Scaling, caching, latency optimization, cost management, and production infrastructure.
01
Design Principles
LLM constraints, control layer pattern, system mental model
Core 02
Architecture Patterns
Single-model, multi-model, router, orchestrator, agent architectures
Core 03
Model Selection
Capability vs cost tradeoffs, model routing, fallback strategies
In-depth 04
API Design
Request/response schemas, streaming, rate limiting, error handling
Core 05
Caching
Semantic caching, exact-match caching, TTL strategies, invalidation
In-depth 06
Scaling
Horizontal scaling, load balancing, queue-based architectures
In-depth 07
Latency
Time-to-first-token, streaming, parallel execution, latency budgets
Core 08
Cost
Token budgets, cost monitoring, optimization, multi-model routing
Core 09
Infrastructure
GPU provisioning, Kubernetes, serverless, multi-region deployment
In-depth 10
Case Studies
Real-world architectures, scaling stories, failure post-mortems
In-depth 10 chapters All Levels Practitioner
Measuring what matters โ benchmarks, LLM-as-judge, golden sets, regression testing, tracing, monitoring, and CI/CD integration for AI systems.
01
Why Eval Matters
Probabilistic systems, measurement gap, eval-driven development
Core 02
Benchmarks
Public benchmarks (MMLU, HumanEval), task-specific, contamination
Core 03
LLM-as-Judge
Judge prompt design, pairwise comparison, rubric scoring, calibration
In-depth 04
Golden Sets
Building golden datasets, annotation guidelines, versioning, coverage
Core 05
Regression Testing
Detecting regressions, test suite design, threshold setting, alerts
Core 06
Tracing
Distributed tracing, span design, trace IDs, context propagation
In-depth 07
Monitoring
Metrics selection, dashboards, alerting rules, SLOs/SLIs
Core 08
Debugging
Root cause analysis, replay attacks, prompt debugging, failure categorization
In-depth 09
CI/CD Integration
Eval in CI pipelines, gate criteria, automated regression, blocking
Core 10
Tooling
LangSmith, Langfuse, promptfoo, Weights & Biases, OpenTelemetry
In-depth 10 chapters All Levels Practitioner
From dataset curation to production deployment โ LoRA, QLoRA, SFT, DPO, RLHF, evaluation, domain adaptation, and the complete MLOps pipeline for fine-tuned models.
01
Why Fine-Tune
Decision ladder, when to use, cost-benefit, vs prompting and RAG
Core 02
Data Preparation
Dataset formats, quality over quantity, deduplication, splits
Core 03
LoRA & PEFT
Low-rank adaptation, QLoRA, DoRA, merging adapters
In-depth 04
Full Fine-Tuning
When to use, compute requirements, learning rate, monitoring
In-depth 05
SFT vs DPO vs RLHF
Training objectives, preference pairs, RLHF complexity, decision guide
In-depth 06
Evaluation
Evaluation hierarchy, golden sets, LLM-as-judge, regression tests
Core 07
Instruction Tuning
LIMA insight, chat templates, multi-turn, system prompts
Core 08
Domain Adaptation
Two-stage approach, medical/legal/code/finance, forgetting prevention
In-depth 09
Serving
Deployment options, quantization, vLLM, Ollama, cloud options
Core 10
Production MLOps
Experiment tracking, model registry, A/B testing, monitoring, flywheel
In-depth 10 chapters All Levels Practitioner
Building and optimizing context windows for LLM applications โ context construction, compression, windowing strategies, caching, and production patterns.
01
Context Fundamentals
Token limits, context decay, lost-in-the-middle problem
Core 02
Context Construction
Selection and ordering, relevance ranking, citation anchoring
Core 03
Context Compression
Summarization, token reduction, semantic compression
In-depth 04
Windowing Strategies
Sliding windows, hierarchical windowing, document splitting
Core 05
Information Density
Signal-to-noise ratio, quality assessment, noise injection
Core 06
Long Context Models
100K+ token windows, position embeddings, scaling laws
In-depth 07
Context Caching
Prefix caching, prompt caching, cost reduction
Core 08
Multi-Document Context
Document ranking, fusion strategies, conflict resolution
In-depth 09
Context Quality Metrics
Relevance scoring, faithfulness, coverage metrics
Core 10
Production Context Systems
Real-time construction, latency, cost, observability
In-depth 10 chapters All Levels Practitioner
Building multimodal AI systems โ vision, audio, text fusion patterns, model selection, and production pipelines for vision-language models.
01
Multimodal Fundamentals
Vision, audio, text fusion, encoding, alignment
Core 02
Vision-Language Models
GPT-4o, Claude, Gemini capabilities and prompting
Core 03
Image Processing
Tokenization, resolution, compression, token budgets
In-depth 04
Audio Integration
Speech-to-text, audio embeddings, alignment
Core 05
Model Architectures
Encoder-decoder, vision transformers, attention mechanisms
In-depth 06
Fusion Strategies
Early fusion, late fusion, cross-modal attention
In-depth 07
Fine-Tuning Multimodal
Data preparation, adapter tuning, multimodal LoRA
Core 08
Evaluation Metrics
Vision benchmarks, alignment metrics, human assessment
In-depth 09
Deployment Pipeline
Input preprocessing, tokenization, batching, format handling
Core 10
Production Multimodal Systems
Latency, cost, caching, observability, scaling
In-depth