Welcome to our weekly FiftyOne tips and tricks blog where we give practical pointers for using FiftyOne on topics inspired by discussions in the open source community. This week we’ll cover importing and exporting datasets.

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

Ok, let’s dive into this week’s tips and tricks!

A primer on importing and exporting

FiftyOne Datasets allow you to easily load, modify, visualize, and evaluate your data along with any related labels (classifications, detections, etc). They provide a consistent interface for loading images, videos, annotations, and model predictions into a format that can be visualized in the FiftyOne App, synced with your annotation source, and shared with others.

If you have your own collection of data, loading it as a Dataset will allow you to easily search and sort your samples. If you have created a custom dataset in FiftyOne with predictions, annotations, and embeddings, you can export this data for further processing.

Continue reading for some tips and tricks to help you master the importing and exporting of datasets in FiftyOne!

Always check the zoo

If you want to work with a relatively common computer vision dataset, and you do not yet have the dataset downloaded to disk, your first step should always be to check the FiftyOne Dataset Zoo.

The Zoo contains a variety of common datasets across multiple computer vision domains, and new datasets are constantly being added.

To load in the desired dataset, find the name of the dataset here under the “Details” section, and use the FiftyOne Zoo’s load_zoo_dataset() method.

To load in the BDD100K dataset, for example, you can do so with:

Learn more about load_zoo_dataset and the FiftyOne Dataset Zoo in the FiftyOne Docs.

Pattern match to common formats

Before building a custom data importer, it is also worth checking FiftyOne’s built-in supported data formats for loading data from disk. FiftyOne has basic importers for constructing datasets from an input directory for various media types and vision tasks, such as ImageSegmentationDirectoryImporter and VideoClassificationDirectoryImporter.

Additionally, FiftyOne provides import classes for various common dataset formats. If you have a copy of the BDD100K dataset already stored on disk, you can load it into FiftyOne with the BDDDatasetImporter class.

Many datasets that do not have custom importers in FiftyOne are still structured using common formats. For instance, a new image detection dataset might have its class labels stored in MS COCO format. In cases like this, the COCODetectionDatasetImporter will work without modification!

Learn more about the FiftyOne DatasetImporter class in the FiftyOne Docs.

Save space on export

When exporting a dataset in FiftyOne using the export() method, you can use the export_media() argument to specify how the media files should be handled. With this parameter, you can choose whether to copy, move, symlink, or omit the media files from the export.

If you have limited space, you can pass in export_media = False to export the dataset without copying the media files, as in the following example.

Learn more about exporting datasets and the FiftyOne DatasetExporter class in the FiftyOne Docs.

Don’t skip class

Some media dataset formats, such as COCO and YOLO, require that a list of classes is stored for the label field. If you use the export() method without passing in a list of classes via the classes argument, the exported list will be generated based on the observed classes in the dataset.

While this can be convenient, if you are not careful, it can lead to unexpected behavior. In particular, if the collection of samples being exported does not have any detections or classifications for a certain class (that is present in the list of allowed classes), that class will not appear in the list of classes for the exported dataset.

It is best practice to set the default_classes property for a dataset while it is in use, and to then pass the argument classes = dataset.default_classes into export(). For most datasets from the FiftyOne Dataset Zoo, the default_classes property is pre-populated.

As an example, suppose we create a dataset from COCO samples that contain “cat” or “dog” detections:

The default_classes property for this dataset still contains all of the COCO classes, even though not every class will show up in the collection:

We can then ensure that all of these classes are exported:

Learn more about default_classes in the FiftyOne Docs.

Cloud-backed media with FiftyOne Teams

With FiftyOne Teams, you can import media files directly from S3 buckets and public or private clouds. Likewise, you can export datasets to the cloud! This can come in handy when dealing with very large datasets.

Learn more about FiftyOne Teams.

Join the FiftyOne community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!

1,200+ FiftyOne Slack members
2,300+ stars on GitHub
2,100+ Meetup members
Used by 210+ repositories
49+ contributors

What’s next?

If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

Talk to a computer vision expert