Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Filtering samples that match a criteria
Community Slack member imyhxy asked:
I have a detection dataset, which contains a
ground_truth
field and apredictions
field. What expression should I enter for the “Match” operation of the FiftyOne App’s Viewbar to filter out the samples that containpredictions
?
For this operation, we’d recommend using the FiftyOne SDK and creating a saved view.
Saved views are a convenient way to record semantically relevant subsets of a dataset, such as:
- Samples in a particular state, eg with certain tag(s)
- A subset of a dataset that was used for a task, eg training a model
- Samples that contain content of interest, eg object types or image characteristics
For your match query, the view might look like this:
import fiftyone as fo dataset = fo.load_dataset("...") view = dataset.match((F("predictions.detections").length() == 0) | ~F("predictions").exists()) session = fo.launch_app(view)
Also, check out this Filtering Cheat Sheet in the FiftyOne Docs for more filtering tips and tricks.
Deduplicating data
Community Slack member Joy asked:
Is there a way to find and delete duplicates based on a field or filepath instead of having to calculate the embeddings?
Yes! This task is relatively simple when you use the Image Deduplication plugin for FiftyOne.
This plugin allows you to:
- Find both exact and approximate duplicate images in your dataset
- Visualize these groups of duplicates
- Delete all duplicates OR Keep a representative from each set of duplicates
You can learn more about this plugin in this blog, “Double Trouble: Eliminate Image Duplicates with FiftyOne” and about image deduplication in the FiftyOne Docs.
Deleting samples with specific tags
Community Slack member Andrea asked:
I want to remove/delete samples from my dataset that I labeled with Fiftyone, how can I accomplish this?
You can delete samples by their tags as shown in the example below. In this case those tagged “delete”:
print(len(dataset)) #100 del_view = dataset.match_tags("delete") dataset.delete_samples(del_view) print(len(dataset)) #99
Also, any view of a dataset can be used to delete samples. Learn more about deleting samples and creating views in the FiftyOne Docs.
Resolving not being able to see image thumbnails from a saved view
Community Slack member Barak asked:
I have a dataset with rather large images, for which I created thumbnails. In the FiftyOne App, the thumbnails load correctly into the grid and I can see the large images when I click on them.
However, once I create a view and try to “open” it, no images are rendered in the grid (instead exclamation marks are presented. Only when I click on a (missing) image it opens with the large image.
Here’s the code I am using to create my view:
person_view = dataset.select_fields("Inference").filter_labels("Inference", fo.ViewField("label") == "PERSON").sort_by(fo.ViewField("Inference.detections").length()) dataset.save_view("person-view", person_view)
The issue with the code snippet is that when you select_fields, you’re excluding the field containing your thumbnail paths. Update your code as such and you should see your thumbnails:
from fiftyone import ViewField as F person_view = dataset.select_fields(["Inference", "MY_THUMBNAIL_FIELD"]).filter_labels("Inference", F("label") == "PERSON").sort_by(F("Inference.detections").length()) dataset.save_view("person-view", person_view)
Performing evaluations on a subset of data
Community Slack member Ido asked:
I’m trying to use the
dataset.evaluate_detections
method to evaluate my object detection model. The results I get is afiftyone.utils.eval.coco.COCODetectionResults
object that I can use to plot the confusion matrix, PR curve, etc. Is there a way to use this object to only show the results of a subset of the data? For example, I have a label named “type” which indicates if the sample belongs to train, validation, or test. Is there a way to show the results of just those labeled “test”?
Yes! You can create a view (a subset of your dataset) and perform evaluation on just the subset. In the example below, we are running an evaluation on a view named, high_conf_view
to assess the quality of only the high confidence predictions in our dataset.
# Evaluate the predictions in the `faster_rcnn` field of our `high_conf_view` # with respect to the objects in the `ground_truth` field results = high_conf_view.evaluate_detections( "faster_rcnn", gt_field="ground_truth", eval_key="eval", compute_mAP=True, ) Evaluating detections... 100% |███████████████████| 96/96 [1.8s elapsed, 0s remaining, 50.7 samples/s] Performing IoU sweep... 100% |███████████████████| 96/96 [1.7s elapsed, 0s remaining, 56.6 samples/s]
For more tips and tricks, check out the “Evaluating Object Detections with FiftyOne” tutorial in the FiftyOne Docs.
Join the FiftyOne Community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 2,000+ FiftyOne Slack members
- 4,000+ stars on GitHub
- 5,000+ Meetup members
- Used by 370+ repositories
- 60+ contributors