Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

Ok, let’s dive into this week’s tips and tricks!

Omitting classes with few detection instances

Community Slack member Sylvia Schmitt asked,

“When grouping samples by values in a specific field, I would like to omit samples which have values that rarely occur in the dataset. How can this be done?”

One way to accomplish this would be to use count_values() to get a count of the number of occurrences of each unique value in the given field across the entire Dataset or DatasetView object, take the values that occur more frequently than your desired cutoff, and use the match() method to get samples that contain these.

For instance, if you wanted to get samples from the test split of the Families in the Wild dataset with name values that occur more than ten times in the dataset, you can do so as follows:

You could then pass this resulting view into group_by() to group by the values in the field, or any other Aggregation you’d like.

Learn more about count_values(), is_in(), and using aggregations in the FiftyOne Docs.

Saving changes to sample fields

Community Slack member Sylvia Schmitt asked,

“When adding sample fields and later on changing these values within a view, do the changes have to be made persistent by calling save() on the `Dataset` object, or will these changes be saved if the dataset is already persistent?”

Great question, Sylvia! In general, when changes are made to an individual sample in a Dataset or DatasetView, the changes need to be saved by calling save() on the sample, not the dataset. This is the case even if the dataset is persistent, i.e. if

As an example, you could change the class label for the first detection of the first sample in the Quickstart dataset as follows:

Using the save() method on a dataset is only necessary when editing dataset-level metadata such as dataset.info.

There are a few cases, however, in which it is not necessary to explicitly run sample.save() to propagate changes back to the dataset. These include the view.set_values(field_name, field_vals) method, which takes in a list of values, field_vals, and writes these to the field field_name for the samples in the view, as well as the view.tag_samples(tags) method, which adds the tags tags to all samples in the view.

If you know that you need to iterate through a Dataset or DatasetView and make changes to each sample, rather than call save() on each sample, it is more efficient to pass autosave=True into iter_samples(), which batches the operations. For example, to set a random field with a random number for each sample in our dataset, we can run:

Learn more about set_values() and tagging samples in the FiftyOne Docs.

Predicting class labels in homogeneous images

Community Slack member George Pearse asked,

“What would be the best way to handle an application where the label for an object is deeply intertwined with the labels of the other objects in the sample? For instance, I might have images that are typically either crowds of all cats, or crowds of all dogs, but not crowds containing both cats and dogs.”

Great question, George! There are many ways to deal with data like this. One approach would be to accumulate a lot of examples like this and train a model on this data. Given enough high quality examples, the model should (theoretically) be able to learn these relationships.

As an alternative approach using just your existing data, you could perform post-processing to the labels in your samples based on the outputs of your model’s predictions. For instance, if your model’s predictions are stored in a model_raw field on your samples, you can create a new label field model_processed and populate the contents of this new field based on the contents of model_raw for that sample.

For each sample, check if there are, say, three or more objects with the same class label. For the sake of simplicity, we’ll assume that dog is this class. If there are, then for all objects that are not labeled as dogs in model_raw, if their class confidence score is below some threshold, set their class label to dog in model_processed.

Here’s what this might look like:

You can then compare these tagged samples whose processed model predictions differ from the raw predictions, and inspect them in the FiftyOne App.

Learn more about saving, keeping, and cloning sample fields in the FiftyOne Docs.

Matching classification results

Community Slack member Nadav asked,

“I have a dataset with two kinds of classification. What is the best way to create a view, in code or in the app, that only contains samples on which the two classifications agree?”

One way to do this in code is to use FiftyOne’s built-in filtering and matching capabilities. The dataset.match(my_condition) method will return a view consisting of all samples on which the condition my_condition is true.

In your case, you can use the ViewField to create the agreement condition between the two classifications. Here is what it could look like:

If you instead wanted a view containing all samples where the two classifications did not line up, you could replace the equality operator == with the inequality operator !=.

Learn more about filtering in the FiftyOne Docs.

Shutting down a session

Community Slack member Scott asked,

“How can I disconnect the launched session?”

In FiftyOne, a Session is an instance of the FiftyOne App connected to a specific Dataset or DatasetView. You can launch a session for a particular dataset or view with the launch_app() method:

You can also view all registered sessions with fo.core.session.session._subscribed_sessions:

defaultdict(set,
{5151: {Dataset: quickstart
Media type: image
Num samples: 20
Selected samples: 0
Selected labels: 0
Session URL: http://localhost:5151/
View stages:
1. Take(size=20, seed=None),
Dataset: quickstart
Media type: image
Num samples: 20
Selected samples: 0
Selected labels: 0
Session URL: http://localhost:5151/
View stages:
1. Take(size=20, seed=None)}})

When you terminate the Python process on which FiftyOne is running, all sessions are shut down, so typically you do not need to shut sessions down explicitly.

However, if you would like to terminate a session at any point, you can do so using the private _unregister_session() method:

Learn more about sessions, including how to launch multiple App instances on a remote machine, in the FiftyOne Docs.

Join the FiftyOne community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!

1,350+ FiftyOne Slack members
2,550+ stars on GitHub
3,200+ Meetup members
Used by 246+ repositories
56+ contributors

Talk to a computer vision expert