Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

Ok, let’s dive into this week’s tips and tricks!

Running the FiftyOne App inside a Sagemaker instance

Community Slack member Nahid asked:

I have 500GB+ of data sitting on an EC2/SageMaker instance and I want to use the image quality plugin on that dataset. Is there a way to have the GUI view of FiftyOne when it is run inside an AWS EC2/SageMaker instance?

Yes, you can employ FiftyOne’s port-forwarding capabilities to accomplish this.

Check out the FiftyOne Docs to learn more about how to manually configure port forwarding, via the FiftyOne CLI, connect to multiple remote sessions and run multiple remote sessions from a single machine.

Leveraging GPUs to compute embeddings

Community Slack member Fredrik asked:

Can I use a GPU when computing embeddings?

Yes, in this summer’s FiftyOne 0.21.5 release, this capability was added. Check out the FiftyOne Docs to learn more about the using_gpu parameter which specifies whether a model should be using a GPU.

Avoiding duplicates in train and test datasets

Community Slack member Samuel asked:

I'm trying to split my data between train and test, but use similarity to ensure there aren't any pseudo-duplicates or similar images between these two sets. Is there a simple way to compute pairwise similarity between two FiftyOne samples? Or alternatively, is there a way to index a SimilarityIndex for two samples?

We recommend checking out FiftyOne’s Image Deduplication Plugin which streamlines image deduplication workflows!

With this plugin, you can:

Find exact duplicate images using a hash function
Find near duplicate images using an embedding model and similarity threshold
View and interact with duplicate images in the App
Remove all duplicates, or keep a representative image from each duplicate set

You can learn more about the plugin in this blog post: “Double Trouble: Eliminate Image Duplicates with FiftyOne.”

Converting image sized masks to masks relative to bounding boxes

Community Slack member George asked:

How do I convert from image sized boolean masks to masks that are relative to a bounding box?

Take a look at some of FiftyOne’s built-in factory methods:

Detection.from_mask(): Creates a Detection instance with its mask attribute populated from the given full image mask.
Segmentation(mask=mask).to_detections(): Returns a Detections representation of this instance with instance masks populated.

Exporting FiftyOne datasets except for the images

Community Slack member Francesco asked:

How can I export or clone a FiftyOne dataset including all predictions, annotations and evaluations? If possible, I'd like to not have to export the images. I’d rather just change the path to the folder containing the images.

Exporting FiftyOne datasets are discussed in detail in the FiftyOne Docs. You can export datasets in the FiftyOne format without copying the source media files by including export_media=False in your call to export().

Exporting in fiftyone.types.FiftyOneDataset format as shown above using the export_media=False and rel_dir parameters is a convenient way to transfer datasets between work environments, since this enables you to store the media files wherever you wish in each environment and then simply provide the appropriate rel_dir value when importing the dataset into FiftyOne in a new environment.

Join the FiftyOne Community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!

2,000+ FiftyOne Slack members
4,000+ stars on GitHub
5,000+ Meetup members
Used by 370+ repositories
60+ contributors

Talk to a computer vision expert