A team of five Meridian AI graduate students took first place at the 2026 NeurIPS AI Safety Hackathon, a 48-hour competition organized by the Center for AI Safety that attracted 312 teams from 68 institutions worldwide. The Meridian team was the only team to receive maximum scores from all three judges in the final evaluation round.
The winning project, called "Canary: Continuous Behavioral Drift Detection for Deployed LLMs," addresses a practical safety challenge that has become increasingly urgent as language models are integrated into high-stakes decision systems. The team built a lightweight runtime monitoring system that detects when a deployed LLM begins producing outputs that diverge from its evaluated behavior — a phenomenon sometimes called "behavioral drift" that can occur due to changes in input distribution, prompt injection, or subtle model updates.
"The core insight is that you can detect drift without access to ground truth labels," explained team lead Preet Ahluwalia, a second-year PhD student in the School of AI Safety under the supervision of Dr. James Okafor. "We use a combination of output distribution statistics and a small set of synthetic probe queries to build a behavioral fingerprint at deployment time, then monitor for divergence. It runs in under two milliseconds per inference on CPU."
The system demonstrated a 94% detection rate on a held-out set of deliberately introduced drift scenarios, with a false positive rate of under 1.2%. Judges noted the practical deployability and the clarity of the technical writeup.
The winning team members are Preet Ahluwalia, Mei-Ling Zhu (PhD, SFM), Daniel Osei (MS, SSG), Valentina Rossi (MS, SAI), and Kofi Mensah (MS, SSG). The team will present a paper based on their work at the Meridian AI spring research symposium in April.
"This win reflects the depth of our students and the interdisciplinary training they receive," said Dr. James Okafor, faculty advisor. "Preet brought the safety-theoretic framing; Valentina and Kofi brought the production engineering perspective. That combination is exactly what we are trying to build here."
The team will receive a $15,000 prize, which they have elected to donate to the Open LLM Reproducibility Initiative.