A guide to using the integration between FiftyOne and Labelbox to build high-quality image and video datasets
Modern computer vision projects in the deep learning age always start with the same thing. LOTS OF DATA! For just about any task you can find countless models with open-source code ready for you to train them. The only thing that you need for your specific task is a sufficiently large, labeled dataset.
In this post, we show how to use the
integration between the open-source dataset curation and model analysis tool,
FiftyOne, and the widely popular annotation tool,
Labelbox, in order to build a high-quality dataset for computer vision.
Follow along in Colab
Setup
You also need to and set up a
Labelbox account. FiftyOne supports both standard Labelbox cloud accounts as well as Labelbox enterprise on-premises solutions.
The easiest way to get started is to use the default Labelbox server, which simply requires creating an account and then providing your API key as shown below.
Alternatively, for a more permanent solution, you can store your credentials in your FiftyOne annotation config located at ~/.fiftyone/annotation_config.json
:
Raw Data
To start, you need to gather raw image or video data relevant to your task. The internet has a lot of places to look for free data. Assuming you have your raw data downloaded locally, you can
easily load it into FiftyOne.
Another method is to use publically available datasets that may be relevant. For example, the
Open Images dataset contains millions of images available for public use and can be accessed directly through the
FiftyOne Dataset Zoo.
FiftyOne provides a
variety of methods that can help you understand the quality of the dataset and pick the best samples to annotate. For example, the
compute_similarity()
the method can be used to find both the most similar, and the most unique samples, ensuring that your dataset will contain an even distribution of data.
Now to select only the slice of our dataset that contains the 10 most unique samples.
Annotation
This API provides
advanced customization options for your annotation tasks. For example, we can construct a sophisticated schema to define the annotations we want and even directly assign the annotators:
Next Steps
This sample appears to be missing a ground truth annotation of skis. Let’s
tag it in FiftyOne, and send it to Labelbox for reannotation.
Iterating over this process of training a model, evaluating its failure modes, and improving the dataset is the most surefire way to produce high-quality datasets and subsequently high-performing models.
Additional Utilities
Project: FiftyOne_quickstart
ID: cktixtv70e8zm0yba501v0ltz
Created at: 2021-09-13 17:46:21+00:00
Updated at: 2021-09-13 17:46:24+00:00
Members:
User: user1
Role: Admin
ID: ckl137jfiss1c07320dacd81l
Nickname: user1
Email: USER1_EMAIL@email.com
User: user2
Role: Labeler
Name: FIRSTNAME LASTNAME
ID: ckl137jfiss1c07320dacd82y
Email: USER2_EMAIL@email.com
Reviews:
Positive: 2
Zero: 0
Negative: 1
You can also
delete projects associated with an annotation run directly through the FiftyOne API.
Summary
No matter what computer vision projects you are working on, you will need a dataset.
FiftyOne makes it easy to curate and dig into your dataset to understand all aspects of it, including what needs to be annotated or reannotated. In turn, the
integration with Labelbox makes this annotation process a breeze resulting in a dataset that will lead to higher-quality models.