Welcome to our weekly FiftyOne tips and tricks blog where we give practical pointers for using FiftyOne on topics inspired by discussions in the open source community. This week we’ll cover importing and exporting datasets.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
A primer on importing and exporting
FiftyOne Datasets allow you to easily load, modify, visualize, and evaluate your data along with any related labels (classifications, detections, etc). They provide a consistent interface for loading images, videos, annotations, and model predictions into a format that can be visualized in the FiftyOne App, synced with your annotation source, and shared with others.
If you have your own collection of data, loading it as a Dataset will allow you to easily search and sort your samples. If you have created a custom dataset in FiftyOne with predictions, annotations, and embeddings, you can export this data for further processing.
Continue reading for some tips and tricks to help you master the importing and exporting of datasets in FiftyOne!
Always check the zoo
If you want to work with a relatively common computer vision dataset, and you do not yet have the dataset downloaded to disk, your first step should always be to check the FiftyOne Dataset Zoo.
The Zoo contains a variety of common datasets across multiple computer vision domains, and new datasets are constantly being added.
To load in the desired dataset, find the name of the dataset here under the “Details” section, and use the FiftyOne Zoo’s load_zoo_dataset()
method.
To load in the BDD100K dataset, for example, you can do so with:
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset(“bdd100k”)
Learn more about load_zoo_dataset and the FiftyOne Dataset Zoo in the FiftyOne Docs.
Pattern match to common formats
Before building a custom data importer, it is also worth checking FiftyOne’s built-in supported data formats for loading data from disk. FiftyOne has basic importers for constructing datasets from an input directory for various media types and vision tasks, such as ImageSegmentationDirectoryImporter
and VideoClassificationDirectoryImporter
.
Additionally, FiftyOne provides import classes for various common dataset formats. If you have a copy of the BDD100K dataset already stored on disk, you can load it into FiftyOne with the BDDDatasetImporter
class.
Many datasets that do not have custom importers in FiftyOne are still structured using common formats. For instance, a new image detection dataset might have its class labels stored in MS COCO format. In cases like this, the COCODetectionDatasetImporter
will work without modification!
Learn more about the FiftyOne DatasetImporter class in the FiftyOne Docs.
Save space on export
When exporting a dataset in FiftyOne using the export()
method, you can use the export_media()
argument to specify how the media files should be handled. With this parameter, you can choose whether to copy, move, symlink, or omit the media files from the export.
If you have limited space, you can pass in export_media = False
to export the dataset without copying the media files, as in the following example.
import fiftyone as fo export_dir = "/path/for/fiftyone-dataset" # The dataset or view to export dataset_or_view = fo.Dataset(...) # Export the dataset without copying the media files dataset_or_view.export( export_dir=export_dir, dataset_type=fo.types.FiftyOneDataset, export_media=False, )
Learn more about exporting datasets and the FiftyOne DatasetExporter class in the FiftyOne Docs.
Don’t skip class
Some media dataset formats, such as COCO and YOLO, require that a list of classes is stored for the label field. If you use the export()
method without passing in a list of classes via the classes
argument, the exported list will be generated based on the observed classes in the dataset.
While this can be convenient, if you are not careful, it can lead to unexpected behavior. In particular, if the collection of samples being exported does not have any detections or classifications for a certain class (that is present in the list of allowed classes), that class will not appear in the list of classes for the exported dataset.
It is best practice to set the default_classes
property for a dataset while it is in use, and to then pass the argument classes = dataset.default_classes
into export()
. For most datasets from the FiftyOne Dataset Zoo, the default_classes
property is pre-populated.
As an example, suppose we create a dataset from COCO samples that contain “cat” or “dog” detections:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F # Load 10 samples containing cats and dogs (among other objects) dataset = foz.load_zoo_dataset( "coco-2017", split="validation", classes=["cat", "dog"], shuffle=True, max_samples=10, )
The default_classes
property for this dataset still contains all of the COCO classes, even though not every class will show up in the collection:
# Loading zoo datasets generally populates the `default_classes` attribute print(len(dataset.default_classes)) # 91
We can then ensure that all of these classes are exported:
view.export( labels_path="/path/to/labels.json", dataset_type=fo.types.COCODetectionDataset, classes=dataset.default_classes, )
Learn more about default_classes in the FiftyOne Docs.
Cloud-backed media with FiftyOne Teams
With FiftyOne Teams, you can import media files directly from S3 buckets and public or private clouds. Likewise, you can export datasets to the cloud! This can come in handy when dealing with very large datasets.
Learn more about FiftyOne Teams.
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 1,200+ FiftyOne Slack members
- 2,300+ stars on GitHub
- 2,100+ Meetup members
- Used by 210+ repositories
- 49+ contributors
What’s next?
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.