Sim-to-Real: Training Robots in Virtual Worlds

The Training Data Problem for Robots

Neural network training requires data — lots of it. A robot learning to walk might need to attempt the task millions of times before achieving consistent performance. In the real world, a million attempts would take months, wear out hardware, and likely involve many destructive falls. Simulation solves this: in a physics simulator, a million training steps might take hours, cost nothing in hardware, and enable parallel training across hundreds of virtual instances simultaneously.

The catch is the "reality gap": simulated environments are approximations. Physics simulators don't perfectly model friction, contact dynamics, material deformation, or sensor noise. A policy trained in simulation may fail when deployed on real hardware because the simulator was wrong about something important.

Domain Randomization

The dominant technique for crossing the reality gap is domain randomization: randomize the simulation parameters during training so the robot learns a policy that works across a wide distribution of environments, rather than the specific parameters of any single simulator. Randomize: physics parameters (friction, mass, inertia), visual parameters (lighting, texture, camera noise), and dynamic parameters (actuator delays, gravity magnitude, object placement).

The intuition: if the simulation distribution includes the real world (or is close to it), the policy trained on the distribution will work in reality. More specifically: the real environment is just one more point in the distribution the policy has been trained to handle.

Dr. Chakraborty's lab at Meridian AI has developed adaptive domain randomization — automatically adjusting which parameters to randomize and by how much based on the difficulty of the current training step, prioritizing parameters that actually affect task performance.

Isaac Lab and GPU-Accelerated Simulation

NVIDIA's Isaac Lab enables GPU-accelerated physics simulation, running thousands of robot environments in parallel on a single A100. This changes the economics: a sim-to-real training run that previously took weeks now takes hours. The quality of the physics simulation has also improved dramatically: contact dynamics, deformable objects, and fluid simulation are now fast enough to use in training loops.

Privileged Information and Asymmetric Actor-Critic

A clever sim-to-real technique: during training, give the policy access to information that's available in simulation (exact contact forces, precise state of hidden objects) but not in the real world. This "privileged information" makes the training problem easier and produces better policy networks. At deployment, train a "student" policy that maps only real-sensor observations to the actions the privileged "teacher" would take. The student learns to infer what the teacher knew directly.

Results at Meridian AI

The Threshold Robotics Lab's work on indoor navigation (Project: Indoor Navigation with World Models) has achieved 87% success on novel layouts in sim-to-real transfer using this approach. For manipulation tasks, the lab's diffusion policy implementation achieves sub-centimeter grasping accuracy on objects seen only in simulation during training. These results indicate that the sim-to-real gap has narrowed to the point where many practical tasks can be solved entirely in simulation.

# ============================================================ # SCHOOL OF FOUNDATIONS & MATHEMATICS # ============================================================