Skip to content

FiftyOne Computer Vision Tips and Tricks – Oct 27, 2023

Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Ok, let’s dive into this week’s tips and tricks!

Running the FiftyOne App inside a Sagemaker instance

Community Slack member Nahid asked:

I have 500GB+ of data sitting on an EC2/SageMaker instance and I want to use the image quality plugin on that dataset. Is there a way to have the GUI view of FiftyOne when it is run inside an AWS EC2/SageMaker instance?

Yes, you can employ FiftyOne’s port-forwarding capabilities to accomplish this. 

# manually on the local machine
ssh -N -L 5151: [<username>@]<hostname>

# via FiftyOne CLI
fiftyone app connect --destination [<username>@]<hostname>

Check out the FiftyOne Docs to learn more about how to manually configure port forwarding, via the FiftyOne CLI, connect to multiple remote sessions and run multiple remote sessions from a single machine.

Leveraging GPUs to compute embeddings

Community Slack member Fredrik asked:

Can I use a GPU when computing embeddings?

Yes, in this summer’s FiftyOne 0.21.5 release, this capability was added. Check out the FiftyOne Docs to learn more about the using_gpu parameter which specifies whether a model should be using a GPU.

Avoiding duplicates in train and test datasets

Community Slack member Samuel asked:

I’m trying to split my data between train and test, but use similarity to ensure there aren’t any pseudo-duplicates or similar images between these two sets. Is there a simple way to compute pairwise similarity between two FiftyOne samples? Or alternatively, is there a way to index a SimilarityIndex for two samples?

We recommend checking out FiftyOne’s Image Deduplication Plugin which streamlines image deduplication workflows!

With this plugin, you can:

  • Find exact duplicate images using a hash function
  • Find near duplicate images using an embedding model and similarity threshold
  • View and interact with duplicate images in the App
  • Remove all duplicates, or keep a representative image from each duplicate set

You can learn more about the plugin in this blog post: “Double Trouble: Eliminate Image Duplicates with FiftyOne.”

Converting image sized masks to masks relative to bounding boxes

Community Slack member George asked:

How do I convert from image sized boolean masks to masks that are relative to a bounding box?

Take a look at some of FiftyOne’s built-in factory methods:

Exporting FiftyOne datasets except for the images

Community Slack member Francesco asked:

How can I export or clone a FiftyOne dataset including all predictions, annotations and evaluations?  If possible, I’d like to not have to export the images. I’d rather just change the path to the folder containing the images.

Exporting FiftyOne datasets are discussed in detail in the FiftyOne Docs. You can export datasets in the FiftyOne format without copying the source media files by including export_media=False in your call to export().

import fiftyone as fo

export_dir = "/path/for/fiftyone-dataset"

# The dataset or view to export
dataset_or_view = fo.Dataset(...)

# Export the dataset without copying the media files

# Export the dataset without media, including only the relative path of
# each image with respect to the given `rel_dir` so that the dataset
# can be imported with a different `rel_dir` prepended later

Exporting in fiftyone.types.FiftyOneDataset format as shown above using the export_media=False and rel_dir parameters is a convenient way to transfer datasets between work environments, since this enables you to store the media files wherever you wish in each environment and then simply provide the appropriate rel_dir value when importing the dataset into FiftyOne in a new environment.

Join the FiftyOne Community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!