A Deep Dive into the nuScenes Dataset

In the rapidly evolving landscape of autonomous vehicles, advancements in computer vision are steering us closer to a future of self-driving cars and enhanced mobility solutions. However, the development and validation of these autonomous systems heavily depend on the availability of high-quality, real-world data. Comprehensive datasets are the bedrock upon which cutting-edge computer vision algorithms are trained and refined, enabling vehicles to perceive, interpret, and react to their surroundings with unprecedented accuracy.

Enter the nuScenes dataset—an indispensable asset in the realm of computer vision for autonomous driving.

nuScenes is a public large-scale dataset for autonomous driving. It enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car.

In this blog post, we'll begin to unravel the layers of nuScenes, understanding its composition, the types of data it encompasses, and its pivotal role in training AI models using the open source FiftyOne computer vision toolset. We'll explore how this dataset advances the capabilities of computer vision, fuels research, and accelerates the deployment of autonomous vehicles, propelling us into a future where the roads are safer and transportation is more efficient. if you are new to FiftyOne, check out this short tour.

The nuScenes dataset is not just a vast collection of images and sensor data; it's a meticulously crafted resource that provides a rich tapestry of information for training and testing computer vision algorithms in the context of autonomous driving. Let's delve into the essential aspects that define and make this dataset a cornerstone in computer vision research.

Comprehensive Data Composition

At its core, the nuScenes dataset is a treasure trove of multi-modal sensor data, comprising high-resolution camera images, LiDAR point clouds, RADAR sweeps, and more. This diverse range of data sources allows researchers and engineers to create holistic perception systems for autonomous vehicles, mimicking the real-world sensory inputs encountered on the road.

nuScenes Car Set Up

Precise Annotations

Annotating data is a crucial step in training computer vision models. nuScenes excels in this aspect with its meticulous labeling of objects, such as vehicles, pedestrians, and cyclists, in each frame. These annotations are essential for tasks like object detection, tracking, and semantic segmentation, enabling AI models to understand and interact with their environment accurately. The dataset contains over 1.4 million object bounding boxes across forty thousand keyframes. It all sums up to 3 million individual sensor inputs across camera images, LIDAR, and RADAR samples!

Realistic Scenarios

nuScenes goes beyond just data collection; it provides a broad spectrum of real-world driving scenarios. From urban environments to highways, day and night scenes, and various weather conditions, it offers a comprehensive set of challenges that autonomous vehicles might encounter. This realism is vital for robust algorithm development, ensuring that AI systems can adapt to the unpredictability of the open road. 1,000 different scenes are collected from streets in Boston and Singapore through a variety of driving maneuvers, traffic situations, and unexpected behaviors.

High Quality and Quantity

Boasting a vast volume of data, nuScenes offers researchers the luxury of training deep learning models on extensive datasets. The dataset's high quality, combined with its quantity, allows for the development of highly accurate and reliable computer vision models that can operate seamlessly in the real world. nuScenes includes around five and half hours worth of data and contains 7x more annotations than KITTI, the dataset that inspired nuScenes and pioneered the ADAS dataset category.

Dynamic and Evolving

The nuScenes dataset is continuously evolving, with periodic updates and expansions. This ensures that it remains relevant and up-to-date, reflecting the ever-changing landscape of urban environments and traffic patterns.

Understanding the nuScenes dataset is not just about recognizing its data components but also appreciating its role as a catalyst for innovation in computer vision and autonomous driving. It serves as a benchmark for the development of state-of-the-art perception systems, pushing the boundaries of what's possible in the world of AI-powered mobility.

nuScenes also released in July 2020 their new lidarseg addon or LIDAR segmenntation, adding 1.4 billion annotated points across 40,000 point clouds. These points cover 32 possible semantic labels across each point in the LIDAR point clouds. With all this different data to take in, let's see how we can get started with nuScenes in FiftyOne.

Exploring nuScenes in FiftyOne

Due to the multi-sensor structure of nuScenes, the dataset in FiftyOne will be a Grouped Dataset with some Dynamic Group Views thrown in there as well. At a high level, we will group together our samples by their associated scene in nuScenes. At regular intervals of each keyframe or approximately every 0.5 seconds (2Hz), we incorporate data from every sensor type, including their respective detections. This amalgamation of data results in distinct groups, each representing the sensor perspective at a given keyframe. We do have each sensor input for every frame, but since only keyframes are annotated, we choose to only load those in.

Ingesting nuScenes

To get started with nuScenes in FiftyOne, first we need to set up our environment for nuScenes. It will require downloading the dataset or a snippet of it as well as downloading the nuScenes python sdk. Full steps on installing can be found here.

Once your nuScenes is installed into your computer, we can kick things off. Let’s start by initializing both nuScenes as well as our FiftyOne dataset. We define our dataset as well as add a group to initialize the dataset to expect grouped data.

nuScenes is comprised of three main data types: Camera, RADAR, and LIDAR. Camera data is made up of jpg’s of different camera angles on the car while RADAR and LIDAR are pcd files or point clouds that were captured using the car’s 3D sensors. We will break down how we ingest each form of media for nuScenes and bring it all together at the end. Before that, a quick intro on the hierarchy of nuScenes. The dataset is composed of scenes, scenes contain samples, and each sample contains tokens that link to both the sensor data and annotations. We will look at how to load data from a single sensor first.

Loading LIDAR Data

Loading a LIDAR sample from nuScenes is composed of two steps, generating the pointcloud and adding the detections. We must convert the binary point clouds to standard in order to ingest them. nuScenes also offers a LIDAR segmentation optional package that allows us to color each point cloud point a color corresponding to its class that we will be utilizing. We start with our lidar token, load in the color map and point cloud that corresponds to the token, and save them back to file with the new coloring and standard pcd point cloud file formatting.

With our point cloud file now properly prepared for ingestion, we can move along to adding detections. To do so, we grab all the detections from the keyframe. We use nuScenes SDK’s builtin box methods to retrieve the location, rotation, and dimensions of the box. To match FiftyOne’s 3D detection input, we take box.orientation.yaw_pitch_roll for rotation, box.wlh for width, length, and height, and box.center for its location. Note too that fo.Sample(filepath=filepath, group=group.element(sensor)) will automatically detect the pcd file and ingest the sample as a point cloud as well! After the method is run and detections are added, we have our LIDAR sample with detections!

Loading Camera Data

Camera data is a bit more straightforward than the lidar data. There is no need to do any prep on the image or save it as a different format. We can go right ahead and grab the detections to add them. The only tricky part is these are no ordinary bounding boxes, they are 3D bounding boxes given in global coordinates!

Luckily for us, nuScenes provides some easy ways to convert their bounding boxes to our pixel space relative to what camera the image came from. For camera data, we load all of our boxes for our sample, check to see which ones are in the frame of our camera data, and then add the cuboids to the sample. In order to add a cuboid or a 3D bounding box, we use polylines and the from_cuboid() method. Let’s take a look at how it is done:

Loading RADAR Data

RADAR is an interesting case. Since we have already stored our 3d detections in the LIDAR sample and RADAR is laid on top of the LIDAR in the 3D visualizer, we don’t need to copy our detections for each point cloud. The simplifies loading RADAR to just:

This is something that can of course be changed by adding the same detection loop we used in LIDAR, especially if you are interested in object detection with only RADAR. With examples on how to load in each sensor input, we can take an initial look at how we bring it all together!

As we begin to form our ingestor, we start with the definition of what our groups will be, one for each sensor on the car. We set up a for loop where we continue on to the next sample into the scene until we reach the last one, denoted by an empty “next” token.

After we start looping through our dataset, we begin to create groups for each keyframe or nuScenes sample. Depending on the modality of the sensor, we take the appropriate action to load the sample. Once it has been created and added to our list, we move on to the next sample.

Finally, with our samples all prepared for our dataset, we can take the last step and add them all to our dataset.

Conclusion

The nuScenes dataset stands as a meticulously curated resource, offering a diverse range of sensor data crucial for training and testing autonomous driving computer vision algorithms. With detailed annotations and a realistic array of driving scenarios, it serves as a cornerstone for building accurate perception systems for autonomous vehicles. Continuously evolving and expanding, nuScenes remains a vital benchmark in AI-driven mobility, facilitating innovation and pushing the boundaries of computer vision research. Its integration into FiftyOne provides a structured approach for efficient dataset exploration and analysis, aligning with its rich multi-sensor structure. To get started exploring, check out nuScenes on try.fiftyone.ai! While there, feel free to browse our other AV datasets such as KITTI, BDD, and more!

Talk to a computer vision expert