Skip to content

FiftyOne Computer Vision Tips and Tricks – April 21, 2023

Welcome to the weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Ok, let’s dive into this week’s tips and tricks!

Basic operations on your dataset using FiftyOne

Community Slack member Earl asked,

“I want to initially use the FiftyOne App to quickly look at my YOLOv5 dataset and annotations. I’d like to analyze my data and do some very basic tasks like remove duplicates, show distributions, etc. It isn’t clear what I can do with the App without writing Python scripts.”

Hey Earl! You’ll need to first load your dataset into FiftyOne either through the Python SDK or the command-line interface. But loading a YOLOv5 dataset into the App takes only a few lines of code: 

import fiftyone as fo
name = "my-dataset"
dataset_dir = "/path/to/yolov5-dataset"

# The splits to load
splits = ["train", "val"]

# Load the dataset, using tags to mark the samples in each split
dataset = fo.Dataset(name)
for split in splits:
    dataset.add_dir(
        dataset_dir=dataset_dir,
        dataset_type=fo.types.YOLOv5Dataset,
        split=split,
        tags=split,
)

# View summary info about the dataset
print(dataset)

# Print the first few samples in the dataset
print(dataset.head())

Once your dataset is in FiftyOne, you can visualize it in the App, view distributions, filter on your labels, and much more. 

Finding duplicate samples will require a couple more lines of code to compute similarity. For example, let’s use the find_duplicates() method to find near-duplicate examples based on the provided parameters.

# Use the similarity index to identify the 1% of images that are least
# similar w.r.t. the other images

results.find_duplicates(fraction=0.01)
print(results.neighbors_map)

More information on finding near duplicate images is available in the FiftyOne docs.

Concatenating generated views such as patches in FiftyOne

Community Slack member Joy asked,

“I want to load the dataset object as patches directly. Is there a way to load the patch view directly in FiftyOne?”

Hi Joy! FiftyOne provides a convenient way to generate patches from your images using the to_patches() method. This method creates a PatchesView for a given SampleCollection, which allows you to view and manipulate patches in your dataset.

To concatenate generated patch views, you can first use the to_patches() method to generate a patch view of your desired SampleCollection, and then use list concatenation to combine multiple patch views. Here’s an example code snippet that demonstrates how to concatenate two patch views in FiftyOne:

import fiftyone as fo
import fiftyone.zoo as foz

# Load a dataset
dataset = foz.load_zoo_dataset("quickstart")

# Generate a patches view
patches_view = dataset.to_patches("ground_truth")

# Get the first 10 patches
patches1 = patches_view[:10]

# Get the last 10 patches
patches2 = patches_view[-10:]

# Concatenate the patches
patches = patches1 + patches2

# Check that the length of the concatenated patches matches the sum of the lengths of the individual patch views
print(len(patches) == len(patches1) + len(patches2))

Here’s the snapshot of the expected output:

For more information on patch views, please visit FiftyOne Docs.

Exporting FiftyOne datasets

Community member, Jack asked:

“I’m using .export() to export my dataset and create a custom coco json. The filenames aren’t absolute. How do I export the full path for my filename? It only exports the filename.”

Hi Jack! FiftyOne provides native support for exporting datasets to disk in a variety of common formats, and it can be easily extended to export datasets in custom formats. The export() method provides additional parameters that you can use to configure the export. For example, you can use the data_path and labels_path parameters to independently customize the location of the exported media and labels, including labels-only exports:

# Export **only** labels in the `ground_truth` field in COCO format
# with absolute image filepaths in the labels

dataset_or_view.export(
    dataset_type=fo.types.COCODetectionDataset,
    labels_path="/path/for/export.json",
    label_field="ground_truth",
    abs_paths=True,
)

Or you can use the export_media parameter to configure whether to copy, move, symlink, or omit the media files from the export:

# Export the labels in the `ground_truth` field in COCO format, and
# move (rather than copy) the source media to the output directory

dataset_or_view.export(
    export_dir="/path/for/export",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth",
    export_media="move",
)

For more information on customizing the exports, you can refer to FiftyOne documents. 

Join the FiftyOne community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!