AI Foundation - Domain 07
Reinforcement Learning
MDPs, Q-learning, policy gradients, RLHF — learning through reward signals from games to LLM alignment.
Tier 3 - Advanced AI
Content Coming Soon
This domain is actively being written. All 6 chapters are planned and outlined. Check back soon, or explore Domain 01 and 02 which are fully available.
Planned Chapters
Ch 7.1
MDPs & Bellman Equations
Coming Soon Ch 7.2
Q-Learning & DQN
Coming Soon Ch 7.3
Policy Gradient Methods
Coming Soon Ch 7.4
Actor-Critic & PPO
Coming Soon Ch 7.5
Model-Based RL
Coming Soon Ch 7.6
RLHF & LLM Alignment
Coming Soon