Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on
Slack,
GitHub, Stack Overflow, and Reddit.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
[@portabletext/react] Unknown block type "externalImage", specify a component for it in the `components.types` prop
Ok, let’s dive into this week’s tips and tricks!
Omitting classes with few detection instances
Community Slack member Sylvia Schmitt asked,
“When grouping samples by values in a specific field, I would like to omit samples which have values that rarely occur in the dataset. How can this be done?”
One way to accomplish this would be to use count_values()
to get a count of the number of occurrences of each unique value in the given field across the entire Dataset
or DatasetView
object, take the values that occur more frequently than your desired cutoff, and use the match()
method to get samples that contain these.
For instance, if you wanted to get samples from the test split of the
Families in the Wild dataset with
name
values that occur more than ten times in the dataset, you can do so as follows:
[@portabletext/react] Unknown block type "externalImage", specify a component for it in the `components.types` prop
You could then pass this resulting view into
group_by()
to group by the values in the field, or any other
Aggregation you’d like.
Saving changes to sample fields
Community Slack member Sylvia Schmitt asked,
“When adding sample fields and later on changing these values within a view, do the changes have to be made persistent by calling save()
on the `Dataset` object, or will these changes be saved if the dataset is already persistent?”
Great question, Sylvia! In general, when changes are made to an individual sample in a Dataset
or DatasetView
, the changes need to be saved by calling save()
on the sample, not the dataset. This is the case even if the dataset is persistent, i.e. if
As an example, you could change the class label for the first detection of the first sample in the
Quickstart dataset as follows:
Using the save()
method on a dataset is only necessary when editing dataset-level metadata such as dataset.info
.
There are a few cases, however, in which it is not necessary to explicitly run sample.save()
to propagate changes back to the dataset. These include the view.set_values(field_name, field_vals)
method, which takes in a list of values, field_vals
, and writes these to the field field_name
for the samples in the view, as well as the view.tag_samples(tags)
method, which adds the tags tags
to all samples in the view.
If you know that you need to iterate through a Dataset
or DatasetView
and make changes to each sample, rather than call save()
on each sample, it is more efficient to pass autosave=True
into iter_samples()
, which batches the operations. For example, to set a random
field with a random number for each sample in our dataset, we can run:
Predicting class labels in homogeneous images
Community Slack member George Pearse asked,
“What would be the best way to handle an application where the label for an object is deeply intertwined with the labels of the other objects in the sample? For instance, I might have images that are typically either crowds of all cats, or crowds of all dogs, but not crowds containing both cats and dogs.”
Great question, George! There are many ways to deal with data like this. One approach would be to accumulate a lot of examples like this and train a model on this data. Given enough high quality examples, the model should (theoretically) be able to learn these relationships.
As an alternative approach using just your existing data, you could perform post-processing to the labels in your samples based on the outputs of your model’s predictions. For instance, if your model’s predictions are stored in a model_raw
field on your samples, you can create a new label field model_processed
and populate the contents of this new field based on the contents of model_raw
for that sample.
For each sample, check if there are, say, three or more objects with the same class label. For the sake of simplicity, we’ll assume that dog
is this class. If there are, then for all objects that are not labeled as dog
s in model_raw
, if their class confidence score is below some threshold, set their class label to dog
in model_processed
.
Here’s what this might look like:
You can then compare these tagged samples whose processed model predictions differ from the raw predictions, and inspect them in the
FiftyOne App.
Matching classification results
Community Slack member Nadav asked,
“I have a dataset with two kinds of classification. What is the best way to create a view, in code or in the app, that only contains samples on which the two classifications agree?”
One way to do this in code is to use FiftyOne’s built-in
filtering and matching capabilities. The
dataset.match(my_condition)
method will return a view consisting of all samples on which the condition
my_condition
is true.
In your case, you can use the
ViewField to create the agreement condition between the two classifications. Here is what it could look like:
If you instead wanted a view containing all samples where the two classifications did not line up, you could replace the equality operator ==
with the inequality operator !=
.
Learn more about
filtering in the FiftyOne Docs.
Shutting down a session
Community Slack member Scott asked,
“How can I disconnect the launched session?”
In FiftyOne, a
Session is an instance of the
FiftyOne App connected to a specific
Dataset
or
DatasetView
. You can launch a session for a particular dataset or view with the
launch_app()
method:
You can also view all registered sessions with fo.core.session.session._subscribed_sessions
:
defaultdict(set,
{5151: {Dataset: quickstart
Media type: image
Num samples: 20
Selected samples: 0
Selected labels: 0
Session URL: http://localhost:5151/
View stages:
1. Take(size=20, seed=None),
Dataset: quickstart
Media type: image
Num samples: 20
Selected samples: 0
Selected labels: 0
Session URL: http://localhost:5151/
View stages:
1. Take(size=20, seed=None)}})
When you terminate the Python process on which FiftyOne is running, all sessions are shut down, so typically you do not need to shut sessions down explicitly.
However, if you would like to terminate a session at any point, you can do so using the private _unregister_session()
method:
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!