AI Foundation - Domain 07

Reinforcement Learning

MDPs, Q-learning, policy gradients, RLHF — learning through reward signals from games to LLM alignment.

Tier 3 - Advanced AI

Content Coming Soon

This domain is actively being written. All 6 chapters are planned and outlined. Check back soon, or explore Domain 01 and 02 which are fully available.

Planned Chapters

Ch 7.1
MDPs & Bellman Equations
Coming Soon
Ch 7.2
Q-Learning & DQN
Coming Soon
Ch 7.3
Policy Gradient Methods
Coming Soon
Ch 7.4
Actor-Critic & PPO
Coming Soon
Ch 7.5
Model-Based RL
Coming Soon
Ch 7.6
RLHF & LLM Alignment
Coming Soon