Contents

Reinforcement Learning (MAP/INF641, M2 Artificial Intelligence and Advanced Visual Computing, Ecole Polytechnique 2021-2022)

Table of Contents

Overview

Optimal control, stochastic and structured bandits, model-based MDP, planning, deep reinforcement learning.

Teaching assistant for practical sessions.

Deep Reinforcement Learning

Policy gradient, Reinforce, PPO, Unity.
Model-based

Model-based MDP: value iteration when the transition probabilities and rewards are known, UCRL algorithms when they are estimated from observations.
Planning

Planning in bandits: pure exploration, best arm identification. Planning in MDP: Monte Carlo Tree Search.

Last updated on Mar 15, 2022