Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on
Slack,
GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
Ok, let’s dive into this week’s tips and tricks!
How to compute distinct values of a field in a collection
Community Slack member ZKW asked:
Are there any methods that allow me to list out all the distinct values for labels in a dataset?
Yes, you’ll want to make use of
distinct() in your query. A few snippets to get you started:
Finding dissimilar images using the “similarity” feature
Community Slack member Mareeta asked:
I am trying to find dissimilar images. Should I use compute_similarity() or compute_uniqueness() ?
To start, uniqueness in an image dataset is a value assigned to each image based on how far away the image is (in embedding space) from its nearest neighbors. Meanwhile, similarity creates an index that allows you to find how all the samples which are related and similar or dissimilar.
The basic steps are:
- Compute similarity on your dataset to create an index
- Sort by similarity in the app as seen in the gif and just add in reverse
- Or, use “sort by similarity” in the SDK as show in the tutorial linkedin to previously
Finally, when sorting, you can store in a field the distance using the SDK like shown below:
That being said, while similarity for images is a fairly well defined concept, the notion of dissimilarity is not always so stable or meaningful.
Exporting labels into a YOLO format
Community Slack member sytkrc asked:
I want to convert my FiftyOne label file to the YOLO format. How can I accomplish this?
How to clean up previous evaluation runs
Community Slack member Joy asked:
Is there a built-in function that allows me to remove evaluation runs and their fields so I can clean up my dataset?
Yes, just use delete_evaluation
and delete_evaluations
methods.
When you run an evaluation with an eval_key argument, the evaluation is recorded on the dataset and you can retrieve information about it later, rename it, delete it (along with any modifications to your dataset that were performed by it), and retrieve the view that you evaluated on. The relevant snippet in this case:
A shortcut to reload a dataset in the FiftyOne App
Community Slack member Matěj asked:
How can I reload a dataset in the FiftyOne app after making a variety of changes. What’s the fastest shortcut to refresh the UI?
Clicking the FiftyOne logo will refresh the UI and reload all the data. Cmd+r also works!