Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on
Slack,
GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
Ok, let’s dive into this week’s tips and tricks!
Running the FiftyOne App inside a Sagemaker instance
Community Slack member Nahid asked:
I have 500GB+ of data sitting on an EC2/SageMaker instance and I want to use the image quality plugin on that dataset. Is there a way to have the GUI view of FiftyOne when it is run inside an AWS EC2/SageMaker instance?
Yes, you can employ FiftyOne’s port-forwarding capabilities to accomplish this.
Leveraging GPUs to compute embeddings
Community Slack member Fredrik asked:
Can I use a GPU when computing embeddings?
Avoiding duplicates in train and test datasets
Community Slack member Samuel asked:
I'm trying to split my data between train and test, but use similarity to ensure there aren't any pseudo-duplicates or similar images between these two sets. Is there a simple way to compute pairwise similarity between two FiftyOne samples? Or alternatively, is there a way to index a SimilarityIndex
for two samples?
With this plugin, you can:
- Find exact duplicate images using a hash function
- Find near duplicate images using an embedding model and similarity threshold
- View and interact with duplicate images in the App
- Remove all duplicates, or keep a representative image from each duplicate set
Converting image sized masks to masks relative to bounding boxes
Community Slack member George asked:
How do I convert from image sized boolean masks to masks that are relative to a bounding box?
Take a look at some of FiftyOne’s built-in factory methods:
Exporting FiftyOne datasets except for the images
Community Slack member Francesco asked:
How can I export or clone a FiftyOne dataset including all predictions, annotations and evaluations? If possible, I'd like to not have to export the images. I’d rather just change the path to the folder containing the images.
Exporting FiftyOne datasets are discussed in detail in the
FiftyOne Docs. You can export datasets in the FiftyOne format without copying the source media files by including
export_media=False
in your call to
export()
.
Exporting in fiftyone.types.FiftyOneDataset
format as shown above using the export_media=False
and rel_dir
parameters is a convenient way to transfer datasets between work environments, since this enables you to store the media files wherever you wish in each environment and then simply provide the appropriate rel_dir
value when importing the dataset into FiftyOne in a new environment.
Join the FiftyOne Community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!