Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
How to compute distinct values of a field in a collection
Community Slack member ZKW asked:
Are there any methods that allow me to list out all the distinct values for labels in a dataset?
Yes, you’ll want to make use of distinct() in your query. A few snippets to get you started:
# Get the distinct tags in a dataset values = dataset.distinct("tags") print(values) # list of distinct values # Get the distinct predicted labels in a dataset values = dataset.distinct("predictions.detections.label") print(values) # list of distinct values # Get the distinct predicted labels after some normalization values = dataset.distinct( F("predictions.detections.label").map_values( {"cat": "pet", "dog": "pet"} ).upper() )
Finding dissimilar images using the “similarity” feature
Community Slack member Mareeta asked:
I am trying to find dissimilar images. Should I use compute_similarity() or compute_uniqueness() ?
To start, uniqueness in an image dataset is a value assigned to each image based on how far away the image is (in embedding space) from its nearest neighbors. Meanwhile, similarity creates an index that allows you to find how all the samples which are related and similar or dissimilar.
To find dissimilar images, check out the “Similarity” section in the Docs.
The basic steps are:
- Compute similarity on your dataset to create an index
- Sort by similarity in the app as seen in the gif and just add in reverse
- Or, use “sort by similarity” in the SDK as show in the tutorial linkedin to previously
# Choose a random image from the dataset query_id = dataset.take(1).first().id # Programmatically construct a view containing the 15 least similar images view = dataset.sort_by_similarity(query_id, k=15, brain_key="img_sim", reverse=True) session.view = view
Finally, when sorting, you can store in a field the distance using the SDK like shown below:
dataset.sort_by_similarity(query_id, k=15, brain_key="img_sim", reverse=True, dist_field="distance")
That being said, while similarity for images is a fairly well defined concept, the notion of dissimilarity is not always so stable or meaningful.
Exporting labels into a YOLO format
Community Slack member sytkrc asked:
I want to convert my FiftyOne label file to the YOLO format. How can I accomplish this?
Check out the “Exporting FiftyOne Datasets” section in the Docs. The relevant code snippet would something like this:
dataset.export( dataset_type=dataset_type=fo.types.YOLOv5Dataset )
How to clean up previous evaluation runs
Community Slack member Joy asked:
Is there a built-in function that allows me to remove evaluation runs and their fields so I can clean up my dataset?
Yes, just use delete_evaluation
and delete_evaluations
methods.
When you run an evaluation with an eval_key argument, the evaluation is recorded on the dataset and you can retrieve information about it later, rename it, delete it (along with any modifications to your dataset that were performed by it), and retrieve the view that you evaluated on. The relevant snippet in this case:
# Delete the evaluation # This will remove any evaluation data that was populated on your dataset for # that specific evaluation run dataset.delete_evaluation("eval") # delete ALL evaluation runs on the dataset dataset.delete_evaluations()
Additional info in the Docs.
A shortcut to reload a dataset in the FiftyOne App
Community Slack member Matěj asked:
How can I reload a dataset in the FiftyOne app after making a variety of changes. What’s the fastest shortcut to refresh the UI?
Clicking the FiftyOne logo will refresh the UI and reload all the data. Cmd+r also works!