A guide to using the integration between FiftyOne and Labelbox to build high-quality image and video datasets

Modern computer vision projects in the deep learning age always start with the same thing. LOTS OF DATA! For just about any task you can find countless models with open-source code ready for you to train them. The only thing that you need for your specific task is a sufficiently large, labeled dataset.

In this post, we show how to use the integration between the open-source dataset curation and model analysis tool, FiftyOne, and the widely popular annotation tool, Labelbox, in order to build a high-quality dataset for computer vision.

Follow along in Colab

You can follow along with the examples in this post directly in your browser through this Google Colab notebook!

Setup

To start, you need to install FiftyOne and Labelbox.

You also need to and set up a Labelbox account. FiftyOne supports both standard Labelbox cloud accounts as well as Labelbox enterprise on-premises solutions.

The easiest way to get started is to use the default Labelbox server, which simply requires creating an account and then providing your API key as shown below.

Alternatively, for a more permanent solution, you can store your credentials in your FiftyOne annotation config located at ~/.fiftyone/annotation_config.json:

Raw Data

To start, you need to gather raw image or video data relevant to your task. The internet has a lot of places to look for free data. Assuming you have your raw data downloaded locally, you can easily load it into FiftyOne.

Another method is to use publically available datasets that may be relevant. For example, the Open Images dataset contains millions of images available for public use and can be accessed directly through the FiftyOne Dataset Zoo.

Either way, once your data is in FiftyOne, we can visualize it in the FiftyOne App.

FiftyOne provides a variety of methods that can help you understand the quality of the dataset and pick the best samples to annotate. For example, the compute_similarity() the method can be used to find both the most similar, and the most unique samples, ensuring that your dataset will contain an even distribution of data.

Now to select only the slice of our dataset that contains the 10 most unique samples.

Annotation

The integration between FiftyOne and Labelbox allows you to begin annotating your image or video data by calling a single method!

The annotations can then be loaded back into FiftyOne in just one more line.

This API provides advanced customization options for your annotation tasks. For example, we can construct a sophisticated schema to define the annotations we want and even directly assign the annotators:

Next Steps

Now that you have a labeled dataset, you can go ahead and start training a model. FiftyOne lets you export your data to disk in a variety of formats (ex: COCO, YOLO, etc) expected by most training pipelines. It also provides workflows for using popular model training libraries like PyTorch, PyTorch Lightning Flash, and Tensorflow.

Once the model is trained, the model predictions can be loaded back into FiftyOne. These predictions can then be evaluated against the ground truth annotations to find where the model is performing well, and where it is performing poorly. This provides insight into the type of samples that need to be added to the training set, as well as any annotation errors that may exist.

We can use the powerful querying capabilities of the FiftyOne API to create a view filtering these model results by false positives with high confidence which generally indicates an error in the ground truth annotation.

This sample appears to be missing a ground truth annotation of skis. Let’s tag it in FiftyOne, and send it to Labelbox for reannotation.

Iterating over this process of training a model, evaluating its failure modes, and improving the dataset is the most surefire way to produce high-quality datasets and subsequently high-performing models.

Additional Utilities

You can perform additional Labelbox-specific operations to monitor the progress of an annotation project initiated through this integration with FiftyOne.

For example, you can view the status of an existing project:

Project: FiftyOne_quickstart
ID: cktixtv70e8zm0yba501v0ltz
Created at: 2021-09-13 17:46:21+00:00
Updated at: 2021-09-13 17:46:24+00:00
Members:

User: user1
Role: Admin
ID: ckl137jfiss1c07320dacd81l
Nickname: user1
Email: USER1_EMAIL@email.com

User: user2
Role: Labeler
Name: FIRSTNAME LASTNAME
ID: ckl137jfiss1c07320dacd82y
Email: USER2_EMAIL@email.com

Reviews:
Positive: 2
Zero: 0
Negative: 1

You can also delete projects associated with an annotation run directly through the FiftyOne API.

Summary

No matter what computer vision projects you are working on, you will need a dataset. FiftyOne makes it easy to curate and dig into your dataset to understand all aspects of it, including what needs to be annotated or reannotated. In turn, the integration with Labelbox makes this annotation process a breeze resulting in a dataset that will lead to higher-quality models.

Talk to a computer vision expert