Reinforcement Learning (MAP/INF641, M2 Artificial Intelligence and Advanced Visual Computing, Ecole Polytechnique 2021-2022)
Table of Contents
Optimal control, stochastic and structured bandits, model-based MDP, planning, deep reinforcement learning.
Teaching assistant for practical sessions.
Policy gradient, Reinforce, PPO, Unity.
Model-based MDP: value iteration when the transition probabilities and rewards are known, UCRL algorithms when they are estimated from observations.
Planning in bandits: pure exploration, best arm identification. Planning in MDP: Monte Carlo Tree Search.