Course Description
Getting an LLM into production efficiently requires understanding the full inference stack: from quantization and caching to serving frameworks and cost management. This course equips students to make informed engineering decisions about LLM deployment — achieving target latency and cost goals while maintaining acceptable quality degradation.