Welcome to the weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Basic operations on your dataset using FiftyOne
Community Slack member Earl asked,
“I want to initially use the FiftyOne App to quickly look at my YOLOv5 dataset and annotations. I’d like to analyze my data and do some very basic tasks like remove duplicates, show distributions, etc. It isn’t clear what I can do with the App without writing Python scripts.”
Hey Earl! You’ll need to first load your dataset into FiftyOne either through the Python SDK or the command-line interface. But loading a YOLOv5 dataset into the App takes only a few lines of code:
import fiftyone as fo name = "my-dataset" dataset_dir = "/path/to/yolov5-dataset" # The splits to load splits = ["train", "val"] # Load the dataset, using tags to mark the samples in each split dataset = fo.Dataset(name) for split in splits: dataset.add_dir( dataset_dir=dataset_dir, dataset_type=fo.types.YOLOv5Dataset, split=split, tags=split, ) # View summary info about the dataset print(dataset) # Print the first few samples in the dataset print(dataset.head())
Once your dataset is in FiftyOne, you can visualize it in the App, view distributions, filter on your labels, and much more.
Finding duplicate samples will require a couple more lines of code to compute similarity. For example, let’s use the find_duplicates() method to find near-duplicate examples based on the provided parameters.
# Use the similarity index to identify the 1% of images that are least # similar w.r.t. the other images results.find_duplicates(fraction=0.01) print(results.neighbors_map)
More information on finding near duplicate images is available in the FiftyOne docs.
Concatenating generated views such as patches in FiftyOne
Community Slack member Joy asked,
“I want to load the dataset object as patches directly. Is there a way to load the patch view directly in FiftyOne?”
Hi Joy! FiftyOne provides a convenient way to generate patches from your images using the to_patches() method. This method creates a PatchesView for a given SampleCollection, which allows you to view and manipulate patches in your dataset.
To concatenate generated patch views, you can first use the to_patches() method to generate a patch view of your desired SampleCollection, and then use list concatenation to combine multiple patch views. Here’s an example code snippet that demonstrates how to concatenate two patch views in FiftyOne:
import fiftyone as fo import fiftyone.zoo as foz # Load a dataset dataset = foz.load_zoo_dataset("quickstart") # Generate a patches view patches_view = dataset.to_patches("ground_truth") # Get the first 10 patches patches1 = patches_view[:10] # Get the last 10 patches patches2 = patches_view[-10:] # Concatenate the patches patches = patches1 + patches2 # Check that the length of the concatenated patches matches the sum of the lengths of the individual patch views print(len(patches) == len(patches1) + len(patches2))
Here’s the snapshot of the expected output:
For more information on patch views, please visit FiftyOne Docs.
Exporting FiftyOne datasets
Community member, Jack asked:
“I’m using .export() to export my dataset and create a custom coco json. The filenames aren’t absolute. How do I export the full path for my filename? It only exports the filename.”
Hi Jack! FiftyOne provides native support for exporting datasets to disk in a variety of common formats, and it can be easily extended to export datasets in custom formats. The export() method provides additional parameters that you can use to configure the export. For example, you can use the data_path and labels_path parameters to independently customize the location of the exported media and labels, including labels-only exports:
# Export **only** labels in the `ground_truth` field in COCO format # with absolute image filepaths in the labels dataset_or_view.export( dataset_type=fo.types.COCODetectionDataset, labels_path="/path/for/export.json", label_field="ground_truth", abs_paths=True, )
Or you can use the export_media parameter to configure whether to copy, move, symlink, or omit the media files from the export:
# Export the labels in the `ground_truth` field in COCO format, and # move (rather than copy) the source media to the output directory dataset_or_view.export( export_dir="/path/for/export", dataset_type=fo.types.COCODetectionDataset, label_field="ground_truth", export_media="move", )
For more information on customizing the exports, you can refer to FiftyOne documents.
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 1,500+ FiftyOne Slack members
- 2,800+ stars on GitHub
- 3,700+ Meetup members
- Used by 265+ repositories
- 59+ contributors