Welcome to our weekly FiftyOne tips and tricks blog where we give practical pointers for using FiftyOne on topics inspired by discussions in the open source community. In this Thanksgiving week installment, in the spirit of togetherness, we’ll cover aggregations.
Wait, What’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
An Aggregations Primer
Datasets are the core data structure in FiftyOne, allowing you to represent your raw data, labels, and associated metadata. When you query and manipulate a Dataset object using dataset views, a DatasetView object is returned, which represents a filtered view into a subset of the underlying dataset’s contents.
Complementary to this data model, one is often interested in computing aggregate statistics about datasets, such as label counts, distributions, and ranges, where each Sample is reduced to a single quantity in the aggregate results.
The fiftyone.core.aggregations
module offers a declarative and highly-efficient approach to computing summary statistics about your datasets and views. Continue reading for some tips and tricks to help you do just that.
One Aggregation; Multiple Datasets
If you want to compute a single aggregation on a single dataset, you can call the aggregation method directly on the dataset. For instance, to compute the Bounds
aggregation on the uniqueness
field, you could write:
dataset.bounds("uniqueness")
However, if you want to use the same aggregation method (on the same field) on multiple datasets or views, you can define the aggregation on its own, and then compute that aggregation using the aggregate()
method:
mean_agg = fo.Mean("predictions.detections.confidence")
mean_value1 = dataset1.aggregate(mean)
mean_value2 = dataset2.aggregate(mean)
Learn more about the aggregate()
method in the FiftyOne Docs.
Multiple Aggregations
Conversely, if you want to compute multiple aggregations on the same dataset, not necessarily on the same field, you can do so efficiently by batching them in the aggregate
method:
# will count the number of samples in a dataset
sample_count = fo.Count()
# will retrieve the distinct labels in the `ground_truth` field
distinct_labels = fo.Distinct("ground_truth.detections.label")
# will compute a histogram of the `uniqueness` field
histogram_values = fo.HistogramValues("uniqueness")
# efficiently compute all three results
aggs = [sample_count, distinct_labels, histogram_values]
count, labels, hist = dataset.aggregate(aggs)
Learn more about batching aggregations in the FiftyOne Docs.
Unwinding Lists of Lists
If we want to compute aggregations that are not built into FiftyOne, such as the median, we can do so by first extracting the values for the relevant field and then applying our aggregation to this result. Due to the unstructured nature of computer vision data, where different samples may contain different numbers of detected objects, this result may be a list of lists of differing sizes.
For instance, if we get the prediction confidence values,
pred_conf_field = "predictions.detections.confidence"
pred_confs_jagged = dataset.values(pred_conf_field)
we can see that the first ten sublists all have different lengths by running:
print([len(p) for p in pred_confs_jagged[:10]])
If we wanted to compute the median from this jagged list of values, we would first need to flatten the list. However, FiftyOne does this for us when we pass the argument unwind = True
into the values()
method:
pred_conf_field = "predictions.detections.confidence"
pred_confs = dataset.values(pred_conf_field, unwind=True)
From there, we can pass the resulting flat array straight to numpy:
median_conf = np.median(pred_confs)
Learn more about unwinding and values()
aggregation in the FiftyOne Docs.
Aggregations on Transformed Field Values
When we use the FiftyOne Aggregations
class in conjunction with ViewField
, we can easily perform aggregations over transformed field values. For instance, to compute the mean of squared prediction confidence values, we can write either of the following:
## Option 1
aggregation = fo.Mean(F("predictions.detections.confidence") ** 2)
squared_conf_mean = dataset.aggregate(aggregation)
## Option 2
squared_conf_mean = dataset.mean(F("predictions.detections.confidence") ** 2))
Learn more about expressions and ViewField
in the FiftyOne Docs.
Going Beyond Aggregations
While aggregations over an entire Dataset
or DatasetView
are powerful, sometimes they are not sufficient to fully understand your model or your data. In fact, it is precisely this need for better transparency during data-model co-design that led to the creation of the FiftyOne App and FiftyOne Teams. Beyond dataset and view level aggregations, FiftyOne provides class-specific reports for classification and multi-class object detection techniques via the print_report()
method.
For multi-class detection tasks, this looks like:
results = dataset.evaluate_detections(
"predictions",
gt_field="ground_truth",
eval_key="eval",
compute_mAP=True,
)
# Get the 10 most common classes in the dataset
counts = dataset.count_values("ground_truth.detections.label")
classes_top10 = sorted(counts, key=counts.get, reverse=True)[:10]
# Print a classification report for the top-10 classes
results.print_report(classes=classes_top10)
For a binary classification task, this looks like:
results = dataset.evaluate_classifications(
"predictions",
gt_field="ground_truth",
eval_key="eval",
method="binary",
classes=["classA", "classB"],
)
results.print_report()
Learn more about print_report()
and evaluating detection and classification tasks in the FiftyOne Docs.
What’s next?
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.