Enabling the AV Datasets of the Future with NVIDIA NuRec and FiftyOne
Aug 11, 2025
3 min read
As autonomous vehicles (AV) and advanced driver assistance systems (ADAS) rapidly move from R&D to real-world deployment, the bar for dataset quality is higher than ever. Yet, teams struggle with fragmented, incomplete, and unvalidated data, which creates significant bottlenecks in the development pipeline. In addition, datasets of the future need to evolve from fixed events frozen in time to reactive, replayable scenes that can simulate completely new events.
As AV and ADAS teams often work with multi-sensor datasets, e.g., LiDAR, radar, issues such as misaligned sensor calibrations, drifting ego-poses, and timestamps often introduce downstream errors in simulation workflows.
By integrating NVIDIA Omniverse NuRec neural reconstruction libraries with Voxel51’s data engine, we can address this bottleneck with a data ingestion and validation pipeline that delivers high-quality data ready for reconstruction and simulation workflows.

From raw data to future-ready outputs

Voxel51’s data engine for visual and multimodal AI, FiftyOne, provides the data preparation and evaluation capabilities for understanding visual data, identifying issues and outliers, and creating clean datasets that maximize model performance.
Omniverse NuRec is a set of libraries and AI models that enable AV developers to bring in their fleet sensor data and generate reconstructed interactive simulation environments that can be used for AV testing, replay, and closed-loop simulation.
The new integration builds a pipeline that converts users’ data on FiftyOne to NuRec’s data format to accelerate neural reconstruction and simulation.
With NuRec and FiftyOne, developers can:
  • Ingest datasets from various AV pipelines as well as from public datasets, such as NuScenes or Waymo Open Dataset.
  • Validate their datasets for consistency by visually inspecting the data
  • Catch and correct misalignments before they propagate into model training
  • Maintain full traceability of validated datasets across the entire dataset lifecycle
Once data is ingested and inspected, the pipeline delivers a high-integrity complete dataset ready for NuRec reconstruction, without any rework necessary.

Creating validation-ready datasets at scale

Instead of retrofitting fixes later, technical teams can now build quality into their datasets from the start. But this integration isn’t just about fixing data quality problems; it’s about creating datasets that improve and accelerate AV development.
The reconstruction-ready data delivered by the pipeline can be further amplified with NVIDIA Cosmos Transfer, a diffusion-based world foundation model that can add new variations, such as weather, lighting, and terrain, to driving scenarios.
In addition, NuRec-reconstructed scenes can be used for simulation in the CARLA open-source AV simulator for seamless and scalable workflows.
Listen to the latest NVIDIA AI Podcast episode, where Porsche and Voxel51 discuss what it takes to build safe, intelligent autonomous vehicles — and why data quality, simulation, and curation are the real bottlenecks.

Building the foundation for tomorrow’s AV data standards

AV and ADAS development demands uncompromising data quality and scalability. As the industry moves toward autonomous systems, the datasets powering these decisions will need to meet increasingly stringent validation standards across initial training, continuous learning, and regulatory compliance.
The AV datasets of the future not only need to be comprehensive, they’ll also need to be clean, consistent, and export-ready by design. By combining NuRec with FiftyOne’s visual data engine, we’re helping AV developers speed up their neural reconstruction and simulation pipelines with frictionless data preparation, while maintaining the rigor required for safety-critical applications.
Join our upcoming webinar on Sept 4, 2025, @9am PT to learn how companies such as Porsche are advancing AV development by leveraging tools like FiftyOne to build next-generation datasets and models.

Talk to a computer vision expert

Loading related posts...