Home Teaching Reinforcement Learning (CentraleSupelec M2 2020-2021) 3. Planning 3. Planning Planning in bandits: pure exploration, best arm identification. Planning in MDP: Monte Carlo Tree Search. Practical session - Best arm identification Forban (bandit library) Practical session - MCTS on TicTacToe Previous 2. Bandits Next 4. Deep Reinforcement Learning