Reinforcement Learning (MAP/INF641, M2 Artificial Intelligence and Advanced Visual Computing, Ecole Polytechnique 2021-2022)

Table of Contents


Optimal control, stochastic and structured bandits, model-based MDP, planning, deep reinforcement learning.

Teaching assistant for practical sessions.

  • Deep Reinforcement Learning

    Policy gradient, Reinforce, PPO, Unity.

  • Model-based

    Model-based MDP: value iteration when the transition probabilities and rewards are known, UCRL algorithms when they are estimated from observations.

  • Planning

    Planning in bandits: pure exploration, best arm identification. Planning in MDP: Monte Carlo Tree Search.