Skip to content

Webinar Recap: Pandas-Style Queries for Computer Vision Data

FiftyOne and pandas are both open source Python libraries that make dealing with your data easy. While they serve different purposes — pandas is built for tabular data, and FiftyOne is built for computer vision data — their syntax and functionality are closely aligned. Members of the FiftyOne community asked for a comparison of the two tools, and so we made a collection of materials available.

Jacob Marks, Machine Learning Engineer and Developer Evangelist at Voxel51, recently presented a live webinar on how to perform pandas-style queries for computer vision data with FiftyOne. You can watch the playback on YouTube, take a look at the slides, see the transcript, and read the recap below for the summary. We also include a section with links to other resources comparing pandas and FiftyOne. Enjoy!

Presentation Highlights

Overview & intro to the unstructured nature of computer vision data

Whether you’re dealing with images, videos, satellite imagery, LiDAR, or other 3D data, the metadata that goes along with the computer vision data is unstructured; detections, tags, key points, segmentation masks, etc. — all of these are more flexible than would typically fit in a tabular format. For example, if we take a look at detections, when you have a dataset of images, not every image is going to have the exact same number of detections. Some images could have zero or one detections, while others have 30 or 40. This means you can’t have a set number of rows to actually structure them. We need a way to be able to work with these types of possibilities. Enter FiftyOne, the open source toolset that brings pandas-style queries to computer vision data.

What is FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Get started with open source FiftyOne gif

A note on setup

In the webinar, Jacob starts from the point where he already has a dataset (in fact, it’s an image dataset of pandas, you know, “big, fuzzy bamboo-eating bears“), and has performed some follow-on activities to get straight into the demos. Therefore, prerequisites to performing pandas-like functionality as Jacob demonstrates in FiftyOne are to:

  • Download FiftyOne
  • Download the dataset
  • Import FiftyOne
  • Load & format the data into FiftyOne
  • Generate predictions (classification & detection)
    — In Jacob’s example, he’s using ResNet-50 for classification predictions and Faster RCNN for detection predictions
  • Compute metadata, uniqueness, and mistakenness

Live demo time!

From now through the Q&A, Jacob demonstrates operations we would be performing on tabular data in pandas, along with their analogies for working with unstructured computer vision data in FiftyOne. He does this in three broad categories: the basics, aggregate statistics, and filtering and matching. Bonus: everything Jacob demos can also be found in this Colab notebook so you can follow along.

1 — The basics: understanding your computer vision dataset and working with samples

The pandas DataFrame and FiftyOne Dataset classes share many similar functionalities. Jacob covers a number of comparisons of common operations in the two libraries, including starting with printing information on the dataset in pandas with df.info() and printing similar information on the dataset in FiftyOne with print(dataset). He then covers performing these basic operations and their accompanying outputs:

Jacob also shows how it’s possible to:

  • Identify what all of the possible values are for classifications in this dataset
    — ds.distinct(*) in FiftyOne is the equivalent of df[*].unique() in pandas
  • Identify all of the distinct detections values, too
  • Find scenarios where classification values include “giant panda” but detection values do not
    — Wait, what’s going on here? Find out in just a few paragraphs, and in the final example in section 3.

While up until now, we’ve been getting a high-level understanding of what’s in our dataset using our notebook, we haven’t yet taken the next step and explored our computer vision dataset by actually looking through the images. FiftyOne makes it very easy to visualize your computer vision data via the FiftyOne App.

Jacob fires up the App and first explores some of the fields that this image dataset contains (a field in FiftyOne is the equivalent of a column in pandas). Because these are computer vision fields, in the App we can see ground truth labels, classifications, detections, bounding boxes, file paths, uniqueness, mistakenness, and many other goodies.

Going back to the issue Jacob identified, where classification values include “giant panda” but detection values do not … it’s easy to go from a notebook to the App with session.show() to see what’s going on. Jacob looks at a few samples in the App and sees that the bounding boxes classify a “bear” (but not “panda”). In this scenario, maybe the detection model wasn’t trained on “panda”, it was trained on “bear” as one of the categories. More on this in the last example in section 3.

2 — Calculating aggregate statistics

FiftyOne makes it easy to get summary statistics of your dataset, in a very similar way to pandas as well. Jacob covers how to compute these aggregations:

Talking through some characteristics of the example dataset, Jacob looks at the bounds for his detection model and computes that there are very high confidence detections, but low confidence predictions.

He also computes the average confidence for detections, which is very low, but the average confidence for classifications is very high.

And because this is computer vision data, we can compute aggregate statistics over fields, and do so across an entire dataset using ViewField. Jacob imports this using:

from fiftyone import ViewField as F

He uses this to compute the length of the mean number of detections across the dataset and find that the average sample has nearly 14 detections, which is a lot. And a lot of those are very low confidence detections, which is something we might want to work on.

(In fact, FiftyOne exists to help you build high-quality datasets and computer vision models! Insights like this into your computer vision datasets can help you do just that.)

Jacob computes the bounds and finds that the image with the fewest detections has two; and the image with the most detections has 40.

3 — Filtering and matching

Next, Jacob moves into demonstrating filtering and matching, which he notes are “some of the more interesting, more complex” activities.

Jacob explains a few basics including how to:

  • Look at the most unique images in our dataset using the match() expression applied to the uniqueness field
  • Sort our images by uniqueness using the ds.sort_by(*) method (the equivalent of df.sort_values() in pandas)
  • View images that are very crowded, with more than 10 predicted detections in them

Jacob then goes into a deeper scenario to identify images that have at least one very high confidence detection in them, and notes that there are only a few and when he looks at these images, he can see some issues emerging, and ways to improve the dataset and models moving forward. Jacob shows in the FiftyOne App the samples that had a very high confidence detection, and it was on a person, not a panda.

This is a question to all the ML/CV engineers out there — how would you handle these images in your real-world scenarios? You could send them for reevaluation. You could get more images like this then fine tune the model to still be able to detect in scenarios like this with high confidence. Or you could say this is an edge case that you don’t want to include in your dataset to begin with.

Another scenario is to sort by “how wrong our predictions are” (the predictions where our model was most confident, but predicted something that was different than the ground truth classification). Jacob describes how to do this using mistakenness to view the most mistaken images first:

mistaken_view = dataset.sort_by(F("mistakenness"), reverse=True)

Doing this results in this view in the App:

Looking at the very first sample in the grid, it’s understandable why it was classified as a prison because the panda is in a cage. The panda is there, but the cage is a very dominant feature. We might want to send that back for reevaluation.

The final scenario Jacob walks us through is how to compare our classifications and detections. When you have a very large bounding box where the box takes up almost the entire canvas of the image, then the information that goes into the bounding box (the detection model’s classification) is very similar to the information that goes into the classification of the entire image.

## Get images with very large bounding boxes - where we expect detections and classifications to align
bbox_area = F("bounding_box")[2] * F("bounding_box")[3]
 
# Only includes predictions whose bounding boxes have an area of at
# least 80% of the image, and only include samples with at least
# one prediction after filtering
large_boxes_view = dataset.filter_labels("faster_rcnn", bbox_area >= 0.8)

Then we can further refine our large box samples by looking for a matches that contain at least one bear classification:

## Get images with large bounding boxes that contain bears
large_boxes_bear_view = large_boxes_view.match(
   F("faster_rcnn.detections.label").contains(["bear"])
)
 
print(large_boxes_bear_view.count())

Finally, we can say that we’re very sure that images of giant pandas are samples that our detection algorithm computes as a large bounding box that is a bear and that our classification model predicts the whole image is a giant panda. So we can start from large boxes that are bears and match for our classification being giant panda, as follows:

definite_panda_view = large_boxes_bear_view.match(F("resnet50.label") == "giant panda")
print(definite_panda_view.count())

When we look at these results, these are images that we can use as a starting point for whatever algorithm or procedure we want to iterate on and improve our models.

Summary

FiftyOne allows you to extend the pandas-type querying functionality available for tabular data to the much more flexible, unstructured data that you might expect and encounter in computer vision workflows. In particular, all of the filtering, matching, and querying operations in this presentation have helped us find mistakes in the ground truth and failure modes for our prediction models. We can then use these insights to decide how to handle edge cases. These are just a few ways that some very simple querying and visualization can help you build high-quality datasets and computer vision models.

Q&A from the Webinar

I’m new to the filter expressions like the ones you demoed. Do you have recommendations on ways I can expand my knowledge in this area?

Yes, this is a popular request and there is a cheat sheet on filtering coming out in a week or two with examples of how to perform these types of filters and matching operations — stay tuned! In the meantime, there are examples in the FiftyOne view expressions documentation with some filtering and matching examples. We also published a Tips & Tricks blog post focused on filtering with FiftyOne.

How many different statistical properties are available there?

There are many different ways available in FiftyOne to compute aggregate statistics about datasets! You can find an overview of the builtin aggregations, covering 15+ examples, in the docs. Also, all builtin aggregations are subclasses of the Aggregation class, each encapsulating the computation of a different statistic about your data. Visit the fiftyone.core.aggregations module which offers a declarative and highly-efficient approach to computing summary statistics about your datasets and views.

Is there a limit to the number of fields you can put into the data?

There are no limits — you can add as many fields as you want (unless and until your database runs out of memory). You can add detections, classifications, relationships, key points — those are all examples of well defined fields that have structure to them. You can also add your own custom fields, whether those are strings or numbers or something else. We also recently added the capabilities to add dynamic fields, adding more flexibility there. You may want to check out these docs resources:

Where is FiftyOne API documented?

The full API in all its glory is documented here in the FiftyOne docs.

Is there any source on ground truth prediction match analysis in object detection such as recall, true positive, false negative, etc.?

Yes, FiftyOne provides a variety of builtin methods for evaluating your model predictions, including regressions, classifications, detections, polygons, instance and semantic segmentations, on both image and video datasets. Therefore it is very easy to evaluate predictions with respect to a ground truth.You can learn more about evaluating models in the docs. There are also some tutorials on model evaluation in the docs.

Is the FiftyOne open source primary purpose is to improve dataset quality and what other things we can do with images/videos?

At a high level, yes! More specifically, FiftyOne’s primary purpose is to help engineers, data scientists, and others who work with computer vision data (including images, videos, geolocation, and 3D) to increase the transparency and clarity of their data, and have a data-centric approach to machine learning. It exists to help you improve the performance of your computer vision models by helping you curate high quality computer vision datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Additional Resources

What’s Next