What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection
Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.
In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.
Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation