LLM-410: Large-Scale Pretraining: Data, Compute, and Infrastructure

Course Description

Training a large language model from scratch requires coordination across data engineering, distributed systems, and optimization. This course covers the full pretraining workflow: from assembling a high-quality dataset through designing the training run and diagnosing instabilities. Students analyze real pretraining logs from open-source models and work through case studies of successful and failed runs.