Reinforcement Learning (MAP/INF641, M2 Artificial Intelligence and Advanced Visual Computing, Ecole Polytechnique 2021-2022)
Table of Contents
Overview
Optimal control, stochastic and structured bandits, model-based MDP, planning, deep reinforcement learning.
Teaching assistant for practical sessions.
-
Deep Reinforcement Learning
Policy gradient, Reinforce, PPO, Unity.
-
Model-based
Model-based MDP: value iteration when the transition probabilities and rewards are known, UCRL algorithms when they are estimated from observations.
-
Planning
Planning in bandits: pure exploration, best arm identification. Planning in MDP: Monte Carlo Tree Search.