Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Running the FiftyOne App inside a Sagemaker instance
Community Slack member Nahid asked:
I have 500GB+ of data sitting on an EC2/SageMaker instance and I want to use the image quality plugin on that dataset. Is there a way to have the GUI view of FiftyOne when it is run inside an AWS EC2/SageMaker instance?
Yes, you can employ FiftyOne’s port-forwarding capabilities to accomplish this.
# manually on the local machine ssh -N -L 5151:127.0.0.1:XXXX [<username>@]<hostname> # via FiftyOne CLI fiftyone app connect --destination [<username>@]<hostname>
Check out the FiftyOne Docs to learn more about how to manually configure port forwarding, via the FiftyOne CLI, connect to multiple remote sessions and run multiple remote sessions from a single machine.
Leveraging GPUs to compute embeddings
Community Slack member Fredrik asked:
Can I use a GPU when computing embeddings?
Yes, in this summer’s FiftyOne 0.21.5 release, this capability was added. Check out the FiftyOne Docs to learn more about the using_gpu
parameter which specifies whether a model should be using a GPU.
Avoiding duplicates in train and test datasets
Community Slack member Samuel asked:
I’m trying to split my data between train and test, but use similarity to ensure there aren’t any pseudo-duplicates or similar images between these two sets. Is there a simple way to compute pairwise similarity between two FiftyOne samples? Or alternatively, is there a way to index a
SimilarityIndex
for two samples?
We recommend checking out FiftyOne’s Image Deduplication Plugin which streamlines image deduplication workflows!
With this plugin, you can:
- Find exact duplicate images using a hash function
- Find near duplicate images using an embedding model and similarity threshold
- View and interact with duplicate images in the App
- Remove all duplicates, or keep a representative image from each duplicate set
You can learn more about the plugin in this blog post: “Double Trouble: Eliminate Image Duplicates with FiftyOne.”
Converting image sized masks to masks relative to bounding boxes
Community Slack member George asked:
How do I convert from image sized boolean masks to masks that are relative to a bounding box?
Take a look at some of FiftyOne’s built-in factory methods:
-
Detection.from_mask()
: Creates aDetection
instance with its mask attribute populated from the given full image mask. Segmentation(mask=mask).to_detections()
: Returns aDetections
representation of this instance with instance masks populated.
Exporting FiftyOne datasets except for the images
Community Slack member Francesco asked:
How can I export or clone a FiftyOne dataset including all predictions, annotations and evaluations? If possible, I’d like to not have to export the images. I’d rather just change the path to the folder containing the images.
Exporting FiftyOne datasets are discussed in detail in the FiftyOne Docs. You can export datasets in the FiftyOne format without copying the source media files by including export_media=False
in your call to export()
.
import fiftyone as fo export_dir = "/path/for/fiftyone-dataset" # The dataset or view to export dataset_or_view = fo.Dataset(...) # Export the dataset without copying the media files dataset_or_view.export( export_dir=export_dir, dataset_type=fo.types.FiftyOneDataset, export_media=False, ) # Export the dataset without media, including only the relative path of # each image with respect to the given `rel_dir` so that the dataset # can be imported with a different `rel_dir` prepended later dataset_or_view.export( export_dir=export_dir, dataset_type=fo.types.FiftyOneDataset, export_media=False, rel_dir="/common/images/dir", )
Exporting in fiftyone.types.FiftyOneDataset
format as shown above using the export_media=False
and rel_dir
parameters is a convenient way to transfer datasets between work environments, since this enables you to store the media files wherever you wish in each environment and then simply provide the appropriate rel_dir
value when importing the dataset into FiftyOne in a new environment.
Join the FiftyOne Community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 2,000+ FiftyOne Slack members
- 4,000+ stars on GitHub
- 5,000+ Meetup members
- Used by 370+ repositories
- 60+ contributors