Skip to content

A Better Way to Visualize 3D Point Clouds and Work with OpenAI’s Point-E

How to visualize point clouds, create orthographic projections, and evaluate detections with the latest release of FiftyOne

KITTI Multiview data and data generated with Point-E point cloud synthesis, viewed in the FiftyOne App
KITTI Multiview data and data generated with Point-E point cloud synthesis, viewed in the FiftyOne App

3D perception in computer vision enables computers and machines to understand the depth and structure of the 3D world around us, just as we do. Work in this field is exciting, with limitless potential for applications to revolutionize the way we live and work across all industries, from automotive to virtual reality. 

At the heart of 3D imaging applications, point clouds are used to efficiently represent three-dimensional spatial data. That’s why recent years have seen a flood of algorithms for processing, understanding, and making predictions using point clouds. These point clouds can be generated via either laser scanning techniques, such as lidar, or photogrammetry, or even via generative techniques, such as OpenAI’s recently released Point-E

The latest release of the FiftyOne computer vision toolset, 0.20, includes enhanced point cloud support to deliver unprecedented access to and control over your 3D data.

FiftyOne 0.20 ships with the following 3D functionality:

  • Orthographic projections, including bird’s eye view (BEV)
  • Support for point cloud-only datasets in the FiftyOne App
  • Multiple point cloud slices in grouped datasets
  • Enhanced rendering and customization
  • Support for evaluating 3D object detection predictions

These features add to FiftyOne’s existing 3D capabilities for working with and visualizing point clouds.

Read on and learn how to harness FiftyOne to inspect, explore, and interact with your 3D data!

Preview 3D data with orthographic projections

Do you ever have a bunch of 3D samples that you want to rapidly peruse? Perhaps you want a bird’s eye view (BEV) of autonomous driving scenes from the KITTI Vision Benchmark Suite, nuScenes, or Waymo Open Dataset? Or perhaps you’re working with a dataset of indoor scenes such as the Stanford Large-Scale 3D Indoor Spaces Dataset, and you want an elevation view into the scene. 

Our new 3D utils integrate this functionality into the FiftyOne library and the FiftyOne App via the compute_orthographic_projection_images() method. 

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.utils3d as fou3d

dataset = foz.load_zoo_dataset("quickstart-groups")

min_bound = (0, -15, -2.73)
max_bound = (20, 15, 1.27)
size = (608, -1)

fou3d.compute_orthographic_projection_images(
    dataset,
    size,
    "bev_images",
    shading_mode="height",
    bounds=(min_bound, max_bound)
)

session = fo.launch_app(dataset)

By default, this method generates bird’s eye view projections of your point clouds, which then show up in the FiftyOne App as previews of point cloud samples (with filterable projections of polylines and bounding boxes as well). Also note that we’ve passed in bounds, telling FiftyOne where to crop the generated images, as well as a shading_mode, specifying that the point cloud’s intensity should be used to color the projection (as opposed to the height values, or colors of the individual points).

Bird's eye view orthographic projection images for point cloud samples in the Quickstart Groups Dataset.
Bird’s eye view orthographic projection images for point cloud samples in the Quickstart Groups Dataset.

If you’d like, you can also pass in a normal vector to specify the plane with respect to which the routine should perform the projection, for instance, try projection_normal=(0.5, 0.5, 0.)!

Inside of the 3D visualizer, you can also control a variety of characteristics of the look and feel, including setting point size and turning grid lines on or off.

Customize the look and feel of FiftyOne's 3D visualizer
Customize the look and feel of FiftyOne’s 3D visualizer

Group point cloud slices

In FiftyOne, grouped datasets allow you to combine samples – potentially with varied media types (image, video, and point cloud) – in groups, with samples occupying different slices. New in this release, FiftyOne has revamped grouped datasets so that groups can have multiple point cloud samples.

This can come in handy in a variety of scenarios, including:

  • Multiple point clouds for the same scene, coming from different sensors
  • Examining the effect of subsampling point clouds with millions of points
  • Transforming point clouds by rotation or scaling operations
  • Coloring points by cluster index or semantic segmentation label

Let’s see this in action, clustering our point clouds with DBSCAN. We’ll create a new group slice, and add a new sample to each group in the pcd_cluster group slice. See this gist for the corresponding code.

Then we compute the orthographic projection images for these new point clouds – but this time, we pass in shading_mode=rgb, because we’ve used the point cloud’s RGB channels to encode cluster numbers. We’ll also use slightly different bounds, so it is easier to see the clusters.

min_bound = (0, -10, -2.73)
max_bound = (20, 10, 1.27)

fou3d.compute_orthographic_projection_images(
    dataset,
    size,
    "/tmp/bev_cluster_images",
    in_group_slice="pcd_cluster",
    shading_mode="rgb",
    bounds=(min_bound, max_bound)
)
Bird's eye view images for a grouped dataset with multiple point cloud slices
Bird’s eye view images for a grouped dataset with multiple point cloud slices

Evaluate 3D object detections

If you’re familiar with FiftyOne’s Evaluation API, you’ll know that the evaluate_detections() already supported the 2D object detection bounding boxes in image and video datasets. Today’s release extends these capabilities to 3D bounding boxes, with arbitrary rotation angles.

FiftyOne automatically recognizes when the bounding box is three dimensional, and applies the appropriate method to compute 3D intersection over union (IoU) scores, which are used to determine whether a prediction agrees with a ground truth object.

As an example, here is some sample code to generate a point cloud only dataset with 50 samples, and 10 ground truth bounding boxes per sample. To generate predictions, we randomly perturb some of the ground truth bounding boxes, and omit others from our predicted detections. The code can be found in this gist.

Illustration of 3D ground truth and prediction bounding boxes in FiftyOne's 3D visualizer
Illustration of 3D ground truth and prediction bounding boxes in FiftyOne’s 3D visualizer

We can then evaluate our detection predictions with evaluate_detections():

results = dataset.evaluate_detections(
    "predictions", 
    eval_key="eval"
)

And print out a report on dataset-level evaluation metrics:

results.print_report()
             precision    recall  f1-score   support

         dog       0.83      0.66      0.74       500

   micro avg       0.83      0.66      0.74       500
   macro avg       0.83      0.66      0.74       500
weighted avg       0.83      0.66      0.74       500

As with 2D detection evaluations, you can specify what IoU threshold to use during evaluation by passing in the iou argument:

results_high_iou = dataset.evaluate_detections(
    "predictions",
    iou=0.75
)

results_high_iou.print_report()
             precision    recall  f1-score   support

         dog       0.12      0.09      0.10       500

   micro avg       0.12      0.09      0.10       500
   macro avg       0.12      0.09      0.10       500
weighted avg       0.12      0.09      0.10       500

Once you have evaluated your object detection predictions, you can also isolate evaluation patches containing, for instance, false positive predictions:

eval_patches = dataset.to_evaluation_patches("eval")
fp_patches = eval_patches.match(F("type") == "fp")

You can then sort by prediction confidence to identify your highest confidence false positive predictions:

high_conf_fp_view = fp_patches.sort_by("predictions.confidence")

Create point cloud-only datasets

Previously, FiftyOne supported point clouds in a grouped dataset along with other media. However, point clouds are first class citizens. As such, the FiftyOne App now supports point cloud only datasets!

One situation in which this might be useful, for instance, is if you’re generating point clouds from scratch. Let’s see this with an example, using OpenAI’s Point-E to turn text prompts into three dimensional point cloud models.

We use the sampler from the Point-E text2pointcloud example notebook, and convert the resulting point clouds using Open3d

def generate_pcd_from_text(prompt):
    samples = None
    for x in sampler.sample_batch_progressive(
        batch_size=1, 
        model_kwargs=dict(texts=[prompt])
    ):
    
    samples = x

    pointe_pcd = sampler.output_to_point_clouds(samples)[0]
    channels = pointe_pcd.channels
    r, g, b = channels["R"], channels["G"], channels["B"]
    colors = np.vstack((r, g, b)).T
    points = pointe_pcd.coords

    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(points)
    pcd.colors = o3d.utility.Vector3dVector(colors)
    return pcd

Then we generate an example dataset in FiftyOne, assigning each point cloud a random filename:

def generate_random_filename():
    rand_str = str(uuid.uuid1()).split('-')[0]
    return "pointe_vehicles/" + rand_str + ".pcd"

def generate_dataset(num_samples = 100):
    vehicles = ["car", "bus", "bike", "motorcycle"]
    colors = ["red", "blue", "green", "yellow", "white"]
    
    samples = []
    for i in tqdm(range(num_samples)):
        vehicle = random.choice(vehicles)
        cols = random.choices(colors, k=2)
        prompt = f"a {cols[0]} {vehicle} with {cols[1]} wheels"
        pcd = generate_pcd_from_text(prompt)
        ofile = generate_random_filename()
        o3d.io.write_point_cloud(ofile, pcd)
        
        sample = fo.Sample(
            filepath = ofile,
            tags = cols,
            vehicle_type = fo.Classification(label = vehicle)
        )
        samples.append(sample)
        
    dataset = fo.Dataset("point-e-vehicles")
    dataset.add_samples(samples)
    return dataset

All that is left to do is compute the orthographic projections. Here we will use a non-default projection_normal so that our preview image is not a bird’s eye view:

import fiftyone.utils.utils3d as fou3d
fou3d.compute_orthographic_projection_images(
    dataset,
    (-1, 608),
    "/tmp/side_images",
    shading_mode="rgb",
    projection_normal = (0, -1, 0)
)
3D point cloud models for vehicles, generated with Point-E point cloud synthesis
3D point cloud models for vehicles, generated with Point-E point cloud synthesis

(3D point cloud) synthesis

If you aren’t working with your 3D point clouds in FiftyOne, you’re missing out. Visualize your point clouds in the same place that you visualize your images, videos, geo data, and more. 

Dive deeper and learn how to use Point-E point cloud synthesis with FiftyOne to generate your own 3D self-driving dataset!

3D self-driving road scenes generated with Point-E and FiftyOne
3D self-driving road scenes generated with Point-E and FiftyOne

Join the FiftyOne community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!