FiftyOne and pandas are both open source Python libraries that make dealing with your data easy. While they serve different purposes — pandas is built for tabular data, and FiftyOne is built for unstructured data in computer vision tasks — their syntax and functionality are closely aligned. Specifically, the pandas DataFrame and FiftyOne Dataset share many similar functionalities.

Because both pandas and FiftyOne are important components to many data science and machine learning workflows, we often hear the request in the FiftyOne community for a comparison of the two. The community requested it; we delivered it. Performing pandas-style queries on your computer vision data using FiftyOne has never been easier.

In this blog post, its companion tutorial — Perform pandas-style queries in FiftyOne, and an accompanying pandas vs FiftyOne Cheat Sheet, we’ll show you how. Huge shout out to community Slack member Kishan Savant for creating the cheat sheet!

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

How to perform pandas-style queries in FiftyOne

Are you a seasoned user of pandas, the Python library for tabular data analysis? Tired of scouring through FiftyOne’s API Reference pages in search of analogous operations?
Interested in seeing a new way to think about the FiftyOne Dataset and DatasetView?

In this blog post we’ll show you how to perform some popular panda-style queries and operations in FiftyOne. To see the complete collection, check out our new pandas and FiftyOne Queries Comparison Guide!

Starting with the basics, this tutorial covers everything you need to know to perform pandas-style queries and operations in FiftyOne. If you have more specific questions, you can go straight to the sections on View Stages, Aggregations, Structural Changes, or Expressions.

Example: getting the column or field schema

In pandas, where all rows in a DataFrame share the same columns, we can get the names of the columns with the columns property. For instance, for a DataFrame df, we can get the columns via:

df.columns

In FiftyOne, the core field schema is shared among samples, but the structure within these first-level fields can vary. We can get the field schema by calling the get_field_schema() method:

ds.get_field_schema()

Example: minimum and maximum values

In pandas, you compute the minimum and maximum value of a Series separately. To get the minimum and maximum values in the “my_col” column in a DataFrame df for instance, we can call:

col_min = df[“my_col”].min()
col_max = df[“my_col”].max()

When working with a FiftyOne Dataset or DataView, the min and max are returned together in a tuple when the bounds() method is called on a field. To get the minimum and maximum in the field “my_field” of a Dataset ds, we use:

field_min, field_max = ds.bounds(“my_field”)

Example: median and other aggregations

Some aggregations which are native to pandas, such as computing the median, are not native to FiftyOne. In these cases, the canonical way to compute the aggregation is by first extracting the values from the Dataset field, and then using native numpy or scipy functionality.

To compute the median value in a column of a pandas DataFrame, we can write:

col_median = df[“my_col”].median()

In FiftyOne, fields can contain embedded fields or lists of values. To avoid dealing with these potentially jagged lists of values, we can pass the argument unwind=True into the Dataset values() method. We can then use this flattened list of values as input into numpy’s median method:

import numpy as np
pred_confs_flat = ds.values("predictions.detections.confidence", unwind = True)
pred_confs_median = np.median(pred_confs_flat)

Community contributions

Shoutout to Kishan Savant, who created the pandas vs FiftyOne Cheat Sheet.

FiftyOne community updates

The FiftyOne community continues to grow!

1,100+ FiftyOne Slack members
2,100+ stars on GitHub
1,700+ Meetup members
Used by 190+ repositories
46+ contributors

What’s next?

Check out the Perform pandas-style queries in FiftyOne tutorial.
Download the pandas vs FiftyOne Cheat Sheet.
If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

Talk to a computer vision expert