Welcome to our weekly FiftyOne tips and tricks blog where we give practical pointers for using FiftyOne on topics inspired by discussions in the open source community. This week we’ll cover filtering.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
A filtering primer
Datasets are the core data structure in FiftyOne, allowing you to represent your raw data, labels, and associated metadata. Querying a Dataset will return a DatasetView that contains only the samples (and possibly filtered contents) that match your query’s criteria.
FiftyOne provides powerful ViewField and ViewExpression classes that allow you to use native Python operators to define the match/filter expressions that retrieve only the content of interest from your dataset. By filtering your data using ViewField and ViewExpression, you can create custom views of your dataset that employ comparison, logic, arithmetic or array operations.
Continue reading for some tips and tricks to help you master filtering in FiftyOne!
Filtering on tags, labels, frames, and keypoints
In many cases FiftyOne’s generic match()
method is what we want. Likewise, we do sometimes want to use the generic filter_field()
. However, there are many cases where we might want to filter directly on special fields like tags, labels, or keypoints.
FiftyOne provides built-in support for these operations with the following special filter and match operations: filter_labels()
, filter_keypoints()
, match_labels()
, match_tags()
, and match_frames()
.
These methods offer even easier ways to select the views we are interested in. Here are two equivalent ways of performing the same operation — creating a view of all samples which have the “test” tag:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart")
# Method 1: filter_fields - verbose
test_view = dataset.filter_field("tags", F().contains("test"))
# Method 2: match_tags - concise
test_view = dataset.match_tags("test")
Learn more about filter_labels(), match_labels(), and match_tags() in the FiftyOne Docs.
Defining complicated filters
While sometimes, the filtering operation one wants to perform can be stated quite simply, other times it is more involved. In these cases, even if all of the filtering is performed on a single field, it can be helpful to define the filter prior to its application, using FiftyOne’s ViewField.
Suppose, for instance, that we want to create a view containing only those images in which we’ve detected more than two and fewer than six objects, unless the sample contains 10 detections, in which case we also want it to be part of our view. If we wanted to do this “inline”, this would be quite cumbersome:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart")
view = dataset.match(
(
(F("predictions.detections").length() < 6)
& (F("predictions.detections").length() > 2)
)
| (F("predictions.detections").length() == 10)
)
Instead, we could save ourselves the potential headaches arising from mismatched parentheses by defining the filter in pieces and then employing it.
gt_cond = F("predictions.detections").length() > 2
lt_cond = F("predictions.detections").length() < 6
except_cond = F("predictions.detections").length() == 10
view = dataset.match((gt_cond & lt_cond) | except_cond)
Learn more about the FiftyOne ViewField in the FiftyOne Docs.
Composing filters
In FiftyOne, you can also create composite filters which include filters on multiple different fields. This can be done by creating one view by applying one of the filters, and then creating a new view by applying a second filter to this first view.
Alternatively, you can accomplish the same composite filtering operation inline without explicitly defining an intermediate view. For example, the two following approaches are equivalent:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart")
# Approach 1: intermediate view
simple_view = dataset.filter_labels(
"predictions", F("confidence") > 0.9
)
complex_view = simple_view.match(
F("predictions.detections").length() > 2
)
# Approach 2: composing filters
complex_view = dataset.filter_labels(
"predictions",
F("confidence") > 0.9).match(F("predictions.detections").length() > 2
These two approaches are equivalent because in FiftyOne, a DatasetView is defined symbolically until operations on the view are performed. This is in contrast to similar filtering operations in pandas, where chained assignment can cause issues.
Learn more about views in the FiftyOne Docs.
Accessing parent-level data
The symbolic nature of filtering queries in FiftyOne also makes it possible for queries on embedded fields to take information from their parent fields into account. The $
symbol prepended to a field name signifies that it refers to parent-level data.
This functionality can be used to leverage sample-level metadata when performing filtering operations on individual detections. For example, FiftyOne stores bounding box coordinates as relative values in [0, 1]. Thus, if we want to select all predicted detections with medium-sized bounding boxes in terms of absolute number of pixels, we need to use the sample width and height, which are stored in the sample’s metadata.
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart")
dataset.compute_metadata()
# Computes the area of each bounding box in pixels
bbox_area = (
F("$metadata.width")
* F("bounding_box")[2]
* F("$metadata.height")
* F("bounding_box")[3]
)
# Only contains boxes whose area is between 32^2 and 96^2 pixels
medium_boxes_view = dataset.filter_labels(
"predictions", (32**2 < bbox_area) & (bbox_area < 96**2)
)
Learn more about embedded fields and the EmbeddedDocumentField
in the FiftyOne Docs.
Filtering in the FiftyOne App
While everything we’ve discussed so far pertains to the Python SDK, it is also possible to filter in the FiftyOne App! When you load a session, on the left hand side, you can filter by labels, sample id, evaluation results, or other fields in your dataset.
For categorical data like labels, you can add any number of selections to the view, or select the toggle “exclude” to view all of the unselected options. For numerical fields, such as image “uniqueness” or prediction “confidence”, you can drag the left and right ends of the corresponding range bar to set the range of allowed values.
You can also filter using the view bar at the top of the FiftyOne App. If you click in the “+ add stage” box on the left side of the view bar, you can select one of the filtering methods and enter the desired information.
Using both the side bar and the view bar, you can compose multiple filtering operations to create complex views that will be displayed in the view grid of the FiftyOne App.
Learn more about the FiftyOne App in the FiftyOne Docs.
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 1,200+ FiftyOne Slack members
- 2,300+ stars on GitHub
- 2,000+ Meetup members
- Used by 208+ repositories
- 51+ contributors
What’s next?
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.