Course Description
Learning and using models of the environment. Dyna architecture. World models: WM2, DreamerV3. MBPO and model rollouts for sample efficiency. Neural network dynamics models: aleatoric vs. epistemic uncertainty. Planning in latent space. MuZero: learning to plan without hand-crafted rules. RSSM (Recurrent State Space Model) for partially observable environments. Students implement DreamerV3 on standard benchmarks.