Program Overview
The MS in AI Engineering & MLOps addresses the gap between AI research and production systems. Most AI projects fail not because the models are bad, but because the infrastructure, monitoring, and organizational processes aren't in place to support them. This program trains engineers who can close that gap — building the pipelines, platforms, and practices that turn research prototypes into reliable production AI.
Curriculum Highlights
- Data Engineering: Feature stores, data versioning (DVC, LakeFS), data quality, streaming data for ML
- Training Infrastructure: Distributed training, experiment tracking (MLflow, W&B), hyperparameter optimization, compute cost management
- Model Serving: Triton, vLLM, TGI, BentoML; latency optimization; autoscaling; A/B testing AI features
- Monitoring & Observability: Data drift detection, model degradation, performance monitoring, alerting
- Platform Engineering: Kubernetes for ML, Kubeflow, Vertex AI, SageMaker; CI/CD for ML pipelines
Sample Courses
- MLE-401: ML Systems Design Fundamentals
- MLE-410: Data Engineering for ML
- MLE-420: Distributed Training and Compute Optimization
- MLE-430: Model Serving and Inference Infrastructure
- MLE-440: ML Monitoring, Observability, and Reliability
- MLE-450: Platform Engineering for AI: Kubernetes and Cloud
- MLE-490: Capstone: End-to-End ML Platform