← Parminces
Sutton & Barto, 2nd Edition (2018)

Reinforcement Learning:
An Introduction

The definitive RL textbook, rebuilt chapter by chapter as interactive lessons. From bandits to policy gradients, with live simulations at every step.

16
Chapters
50+
Simulations
100+
Quizzes
Part I: Tabular Solution Methods
Chapter 2

Multi-armed Bandits

Exploration vs exploitation, epsilon-greedy, UCB, gradient bandits.

Chapter 3

Finite Markov Decision Processes

Agent-environment interface, returns, value functions, Bellman equations.

Chapter 4

Dynamic Programming

Policy evaluation, policy improvement, value iteration, GPI.

Chapter 5

Monte Carlo Methods

MC prediction, MC control, importance sampling, off-policy.

Chapter 6

Temporal-Difference Learning

TD(0), SARSA, Q-learning, expected SARSA, double learning.

Chapter 7

n-step Bootstrapping

n-step TD, n-step SARSA, tree backup algorithm.

Chapter 8

Planning and Learning

Dyna, prioritized sweeping, MCTS, model-based RL.

Part II: Approximate Solution Methods
Chapter 9

On-policy Prediction with Approximation

SGD, linear methods, tile coding, neural networks, LSTD.

Chapter 10

On-policy Control with Approximation

Semi-gradient SARSA, average reward, continuing tasks.

Chapter 11

Off-policy Methods with Approximation

The deadly triad, Baird's counterexample, gradient-TD.

Chapter 12

Eligibility Traces

Lambda-return, TD(lambda), true online TD(lambda), SARSA(lambda).

Chapter 13

Policy Gradient Methods

REINFORCE, baselines, actor-critic, policy gradient theorem.

Part III: Looking Deeper
Chapter 14

Psychology

Classical conditioning, instrumental conditioning, TD model.

Chapter 15

Neuroscience

Reward prediction error, dopamine, basal ganglia, habits.

Chapter 16

Applications and Case Studies

TD-Gammon, Atari DQN, AlphaGo, personalized web services.

Chapter 17

Frontiers

Options, temporal abstraction, reward design, future of RL.