Welcome to our weekly FiftyOne tips and tricks blog where we cover interesting workflows and features of FiftyOne! This week we are taking a look at 3D Detections. We aim to cover the basics of creating 3D detections and how they can be utilized in LIDAR or point cloud datasets.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
What Makes a 3D Detection?
3D detection can mean a variety of things in computer vision depending on the context. This week we are looking at 3D detections as they pertain to 3D LIDAR or point cloud spaces. These kinds of detections are common in automotive and topography datasets and leverage the insightful knowledge of the different depths of the environment.
In FiftyOne, detections are a single label for either 2D or 3D, it is how we define them that makes the difference! Let’s start with 2D and see how we build a detection label.
sample["ground_truth"] = fo.Detections( detections=[ fo.Detection( label="cat", bounding_box=[0.480, 0.513, 0.397, 0.288], ), ] )
We can see that all we need is the label and the bounding box to construct our label. In 3D detections, we also provide a label, but instead of a 2D bounding box, we now provide the details in order to build a 3D bounding box, accounting for the extra dimension.
# Object label label = "vehicle" # Object center `[x, y, z]` in scene coordinates location = [0.47, 1.49, 69.44] # Object dimensions `[x, y, z]` in scene units dimensions = [2.85, 2.63, 12.34] # Object rotation `[x, y, z]` around its center, in `[-pi, pi]` rotation = [0, -1.56, 0] # A 3D object detection detection = fo.Detection( label=label, location=location, dimensions=dimensions, rotation=rotation, )
If you are interested in placing 3D detections onto a 2D space, checkout one of our previous Tips and Tricks on Polylines!
3D bounding box detections are defined in FiftyOne with three input parameters: location, rotation, and dimensions. Location refers to the location of the bounding box on the point cloud coordinate system. The location is the absolute center of the bounding box. Rotation is the amount the bounding box is rotated among its axes. Rotation is also given as
[x, y, z] where x is the rotation around the x axis in a space of
[-pi, pi]. Finally, dimensions are the width, length, and height of the bounding box.
Lets try making our first 3D bounding box!
Creating a Basic 3D Bounding Box
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart-groups") session = fo.launch_app(dataset)
We start by loading in a dataset that has point clouds as a part of it.
quickstart-groups is a subset of a KITTI formatted dataset. Feel free to explore the point clouds by clicking on one of the groups and looking at the existing detections.
Next, we will add a very basic bounding box at the center of our point cloud. Switch to the
pcd group slice to add the detection to the point cloud, then add the detection. We create a view at the end to show only the sample we are interested in.
dataset.group_slice = "pcd" sample = dataset.first() bounding_box = fo.Detection( label="example", location=[0,0,0], rotation=[0, 0, 0], dimensions=[3,3,3] ) sample["example"]= bounding_box sample.save() dataset.save() view = dataset.filter_labels( "example", F("label").is_in(["example"]) ) session.view = view
Below we can see the results!
With our first box placed, we can play around with some of the input parameters to get comfortable with the FiftyOne 3D visualizer. Lets modify the
location as well as the
wlh to get more of a grasp of the coordinate system.
sample = dataset.first() bounding_box = fo.Detection( label="example", location=[0,2,4], rotation=[0, 0, 0], dimensions=[1,2,3] ) sample["example"]= bounding_box sample.save() dataset.save() view = dataset.filter_labels( "example", F("label").is_in(["example"]) ) session.view = view
We can see how the box has transformed and shifted along the coordinate plane. Feel free to tweak these numbers until you get a comfortable feel. A good foundation of the coordinate system will help troubleshoot any potential problems in the future!
Rotating the Bounding Box
Rotating the bounding box is simple, for each given axis, define in radians how much you want your box to rotate around the circle. Play around with the tweaking rotation numbers to get a handle on situating your boxes.
💡 Pro Tip, save rotation for last if you are trying to align your boxes. Different standards exist for 3D detections, so be confident you aren’t accidentally rotating too much that now your width and length have flipped!
import numpy as np from fiftyone import ViewField as F # x = left+right = orange | y = forward/back = green | z = up+down = blue bounding_box = fo.Detection( label="example", location=[0,0,0], rotation=[np.pi/4, np.pi/4, 0], dimensions=[3,2,1] ) sample["example"] = bounding_box sample.save() dataset.save() view = dataset.filter_labels( "example", F("label").is_in(["example"]) ) session.view = view
Similar to the last example, we can define polygons with a set of points to create a 3D polyline. To create a 3D polyline, all you need to do is define the points on the grid in a list.
💡 To close a shape, specify the first vertice again.
We can add these lines to our sample and display our new shape!
label = "lane" # A list of lists of `[x, y, z]` points in scene coordinates describing # the vertices of each shape in the polyline points3d = [[[1, 1, 1], [0, 2, 2]], [[0, 2, 2], [-1, 1, 1]], [[-1, 1, 1], [1, 1, 1]]] # A set of semantically related 3D polylines polyline = fo.Polyline(label=label, points3d=points3d,) sample["polylines"] = polyline sample.save() dataset.save() view = dataset.filter_labels( "polylines", F("label").is_in(["lane"]) ) session.view = view
Ending with a final tool you can use when working with 3D data is Orthographic Projections! You have probably noticed when running the app that the LIDAR samples have no thumbnails. We can computer orthographic projections of our LIDAR space to create thumbnails and add a new slice to our dataset to curate by! This will give you a birds eye view of your data to look through with tons of options to change coloring or rendering.
import fiftyone.utils.utils3d as fou3d min_bound = (0, -15, -2.73) max_bound = (20, 15, 1.27) size = (-1, 512) fou3d.compute_orthographic_projection_images( dataset, size, "/tmp/proj", shading_mode="height", out_group_slice="proj" ) session.view = dataset.view()
Now our data is that much easier to curate and parse through!
See it in action!
In summary, 3D detections in point clouds are vital for advancing computer vision capabilities. With FiftyOne, they enable you to understand precise spatial data, allowing you to comprehend and interact with your dataset in a three-dimensional context. FiftyOne’s 3D detection powers applications such as autonomous driving, robotics, and augmented reality by providing critical depth information and object localization. Level up your three-dimensional data today by using FiftyOne!
Join the FiftyOne Community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 2,000+ FiftyOne Slack members
- 4,000+ stars on GitHub
- 5,000+ Meetup members
- Used by 370+ repositories
- 60+ contributors