Autonomous vehicle (AV) technology, including self-driving systems, is rapidly advancing but is hindered by the limited availability of diverse and realistic driving data. Traditional data collection methods, which deploy sensor-equipped vehicles to capture real-world scenarios, are costly, time-consuming, and risk-prone, especially for rare but critical edge cases.
We introduce the Autonomous Temporal Diffusion Model (AutoTDM), a foundation model that generates realistic, physics-consistent driving videos. By leveraging natural language prompts and integrating semantic sensory data inputs like depth maps, edge detection, segmentation maps, and camera positions, AutoTDM produces high-quality, consistent driving scenes that are controllable and adaptable to various simulation needs. This capability is crucial for developing robust autonomous navigation systems, as it allows for the simulation of long-duration driving scenarios under diverse conditions.
AutoTDM offers a scalable, cost-effective solution for training and validating autonomous systems, enhancing safety and accelerating industry advancements by simulating comprehensive driving scenarios in a controlled virtual environment, which marks a significant leap forward in autonomous vehicle development.