Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Omitting classes with few detection instances
Community Slack member Sylvia Schmitt asked,
“When grouping samples by values in a specific field, I would like to omit samples which have values that rarely occur in the dataset. How can this be done?”
One way to accomplish this would be to use count_values()
to get a count of the number of occurrences of each unique value in the given field across the entire Dataset
or DatasetView
object, take the values that occur more frequently than your desired cutoff, and use the match()
method to get samples that contain these.
For instance, if you wanted to get samples from the test split of the Families in the Wild dataset with name
values that occur more than ten times in the dataset, you can do so as follows:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F ## load the dataset dataset = foz.load_zoo_dataset("fiw", split="test") counts = dataset.count_values("name") keep_names = [name for name, count in counts.items() if count > 10] ## filter for samples with these names view = dataset.match(F("name").is_in(keep_names)) session = fo.launch_app(view)
You could then pass this resulting view into group_by()
to group by the values in the field, or any other Aggregation you’d like.
Learn more about count_values(), is_in(), and using aggregations in the FiftyOne Docs.
Saving changes to sample fields
Community Slack member Sylvia Schmitt asked,
“When adding sample fields and later on changing these values within a view, do the changes have to be made persistent by calling save()
on the `Dataset` object, or will these changes be saved if the dataset is already persistent?”
Great question, Sylvia! In general, when changes are made to an individual sample in a Dataset
or DatasetView
, the changes need to be saved by calling save()
on the sample, not the dataset. This is the case even if the dataset is persistent, i.e. if
dataset.persistent = True
As an example, you could change the class label for the first detection of the first sample in the Quickstart dataset as follows:
import fiftyone as fo import fiftyone.zoo as foz ## load dataset dataset = foz.load_zoo_dataset("quickstart") ## get sample sample = dataset.first() ## change label sample.ground_truth.detections[0].label = "bear" ## save changes to dataset sample.save()
Using the save()
method on a dataset is only necessary when editing dataset-level metadata such as dataset.info
.
There are a few cases, however, in which it is not necessary to explicitly run sample.save()
to propagate changes back to the dataset. These include the view.set_values(field_name, field_vals)
method, which takes in a list of values, field_vals
, and writes these to the field field_name
for the samples in the view, as well as the view.tag_samples(tags)
method, which adds the tags tags
to all samples in the view.
If you know that you need to iterate through a Dataset
or DatasetView
and make changes to each sample, rather than call save()
on each sample, it is more efficient to pass autosave=True
into iter_samples()
, which batches the operations. For example, to set a random
field with a random number for each sample in our dataset, we can run:
import random import fiftyone as fo import fiftyone.zoo as foz ## load dataset dataset = foz.load_zoo_dataset("quickstart") ## Automatically saves sample edits in efficient batches for sample in dataset.select_fields().iter_samples(autosave=True): sample["random"] = random.random()
Learn more about set_values() and tagging samples in the FiftyOne Docs.
Predicting class labels in homogeneous images
Community Slack member George Pearse asked,
“What would be the best way to handle an application where the label for an object is deeply intertwined with the labels of the other objects in the sample? For instance, I might have images that are typically either crowds of all cats, or crowds of all dogs, but not crowds containing both cats and dogs.”
Great question, George! There are many ways to deal with data like this. One approach would be to accumulate a lot of examples like this and train a model on this data. Given enough high quality examples, the model should (theoretically) be able to learn these relationships.
As an alternative approach using just your existing data, you could perform post-processing to the labels in your samples based on the outputs of your model’s predictions. For instance, if your model’s predictions are stored in a model_raw
field on your samples, you can create a new label field model_processed
and populate the contents of this new field based on the contents of model_raw
for that sample.
For each sample, check if there are, say, three or more objects with the same class label. For the sake of simplicity, we’ll assume that dog
is this class. If there are, then for all objects that are not labeled as dog
s in model_raw
, if their class confidence score is below some threshold, set their class label to dog
in model_processed
.
Here’s what this might look like:
import numpy as np import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F ## create or load your dataset dataset = fo.Dataset(..) ## clone predictions into new field dataset.clone_sample_field( "model_raw", "model_processed" ) ## set a class confidence threshold conf_thresh = 0.3 ## iterate through samples in dataset for sample in dataset.iter_samples(autosave=True): dets = sample.model_processed.detections labels = [det.label for det in dets] unique_labels, label_counts = np.unique(labels, return_counts=True) ## find samples with at least 3 labels of same class if max(label_counts) > 2: crowd_label = unique_labels[np.argmax(label_counts)] for det in dets: if (det.label != crowd_label) and (det.confidence < conf_thresh): det.label = crowd_label det.confidence = None ## tag samples to look at later sample.tags.append("possible homogeneous crowd")
You can then compare these tagged samples whose processed model predictions differ from the raw predictions, and inspect them in the FiftyOne App.
Learn more about saving, keeping, and cloning sample fields in the FiftyOne Docs.
Matching classification results
Community Slack member Nadav asked,
“I have a dataset with two kinds of classification. What is the best way to create a view, in code or in the app, that only contains samples on which the two classifications agree?”
One way to do this in code is to use FiftyOne’s built-in filtering and matching capabilities. The dataset.match(my_condition)
method will return a view consisting of all samples on which the condition my_condition
is true.
In your case, you can use the ViewField to create the agreement condition between the two classifications. Here is what it could look like:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F # create or load your dataset with # classifications in field1 and field2 dataset = fo.Dataset(...) view = dataset.match( F("field1.label") == F("field2.label") ) session = fo.launch_app(view)
If you instead wanted a view containing all samples where the two classifications did not line up, you could replace the equality operator ==
with the inequality operator !=
.
Learn more about filtering in the FiftyOne Docs.
Shutting down a session
Community Slack member Scott asked,
“How can I disconnect the launched session?”
In FiftyOne, a Session is an instance of the FiftyOne App connected to a specific Dataset
or DatasetView
. You can launch a session for a particular dataset or view with the launch_app()
method:
import fiftyone as fo import fiftyone.zoo as foz ## load dataset dataset = foz.load_zoo_dataset("quickstart") ## launch one session session1 = fo.launch_app(dataset) ## create a view view = dataset.take(20) ## launch another session session2 = fo.launch_app(view)
You can also view all registered sessions with fo.core.session.session._subscribed_sessions
:
defaultdict(set, {5151: {Dataset: quickstart Media type: image Num samples: 20 Selected samples: 0 Selected labels: 0 Session URL: http://localhost:5151/ View stages: 1. Take(size=20, seed=None), Dataset: quickstart Media type: image Num samples: 20 Selected samples: 0 Selected labels: 0 Session URL: http://localhost:5151/ View stages: 1. Take(size=20, seed=None)}})
When you terminate the Python process on which FiftyOne is running, all sessions are shut down, so typically you do not need to shut sessions down explicitly.
However, if you would like to terminate a session at any point, you can do so using the private _unregister_session()
method:
from fiftyone.core.session.session import _unregister_session _unregister_session(session1)
Learn more about sessions, including how to launch multiple App instances on a remote machine, in the FiftyOne Docs.
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 1,350+ FiftyOne Slack members
- 2,550+ stars on GitHub
- 3,200+ Meetup members
- Used by 246+ repositories
- 56+ contributors