Understanding Grouped Datasets with FiftyOne

Sep 1, 2023
5 min read
Grouped datasets in FiftyOne’s open source platform provide a powerful way to organize and analyze related samples across multiple modalities or perspectives. Whether you are working with multi-camera image captures, synchronized sensor streams, stereo vision systems, or multimodal machine learning workflows, grouped datasets allow you to maintain relationships between samples while still leveraging the full power of FiftyOne’s dataset view and querying system.
In this walkthrough, we will cover the fundamentals of working with grouped datasets in FiftyOne, including how to create your first grouped dataset, define and manage group slices, iterate through grouped samples, and build expressive dataset views across multiple slices. We will also explore filtering, aggregations, and advanced querying patterns that make grouped datasets especially useful for large-scale computer vision workflows.
By the end of this guide, you will have a solid foundation for building and interacting with multiview computer vision datasets in FiftyOne using practical Python examples.

What is a grouped dataset?

Before diving into the topic today, let’s first understand what a group dataset is and why we would want to use one. A grouped dataset is a collection of multiple slices of samples of possibly different modalities (image, video, or point cloud) that are organized into groups. Another way to look at it is multiview datasets are also representative of grouped computer vision datasets.
Samples in the same group are related. For example, grouped datasets can be used to represent multiview scenes, where data for multiple perspectives of the same scene can be stored, visualized, and queried in ways that respect the relationships between the slices of data.

Kickstarting your first grouped dataset

To get started, create a dataset and add a group field. All grouped datasets must contain a group field, where samples of our chosen media will be placed. The optional parameter for `default` refers to the default slice of the group that will be returned when interacting with the dataset via Python, and the slice that will be shown when you first launch a session of the FiftyOne App. This can all be changed with ease and will be detailed later. For now let’s add some images to our dataset:
Preparing the data is easy for your grouped dataset. Define your groups and create a dictionary of the filepath of the sample as well and the group it is in.
With our data ready, it is time to throw it into our CV grouped dataset.
All it takes to add your data into your dataset is to iterate through your groups and add all the images or data into their new samples. Once all the samples have been created, we can add all of them at once with dataset.add_samples(samples). Congrats. That’s all it takes to create to take a grouped dataset. We can visualize our first dataset with:

Working with CV grouped datasets

Now that you have your grouped dataset, what can you do with it? If this is your first time using FiftyOne’s open source platform or maybe you need a refresher on creating views and working with FiftyOne CV datasets, I recommend brushing up with Views Guide.
For the most part, the Python syntax for interacting with grouped datasets is identical to that of non-grouped datasets. We can start by getting some basic information about our CV dataset and use that access or filter to our needs.

What are the groups in my dataset?

Here we can see what our group slices are and what kind of media inside of them. Remember that only one slice is active at a time. By default we set it to `center` so all functions we run to grab samples or stats will return the center slice.
We can see it is the center slice under the `group` field on the bottom. If we wanted to change the active slice, all we need to do is:
Changing the active group slice also changes it in your App as well!
The next natural question in your head is, “What if I want the entire group and not just one sample?” No problem. Just grab the group id and pull like this:
Now we can access each piece of media in the group for the sample we are looking for.

Iterating through your grouped dataset

There are two suggested ways to iterate through your group dataset: Iterating through your active slice or iterating through each group. Depending on your use case, choose which one is best for you. To iterate through just your active slice you can use:
Remember, you can always change the active slice with dataset.group_slice = slice.
To iterate over your groups, you can use the iter_groups() function:

Creating views in your grouped dataset

One of the best parts about creating a grouped dataset is you have the entire dataset view language at your disposal to sort, slice, and search through your dataset. Iterating through, grabbing samples, or any other basic property of grouped datasets carries over when you make a new view. There are tons of possibilities of what views or subsets of your computer vison dataset you can make, so I will highlight just a few of the great possibilities:
Filter based on class:
We can even filter on multiple groups at once, using the computed metadata of the samples.
Create views of joined group slices:
If you want to create a view of just two of your groups, you can easily select multiple groups to create a new view that can become a new dataset or just the next step of your filtering or sorting process.
Likewise, we can exclude individual groups from our view as well.
Aggregations:
We can still grab the statistics we want from our dataset and apply these to be used in views:
Putting it all together we can create complex views to suit exactly what you need.

Next steps for your grouped datasets

We hope this quick walkthrough has allowed you to understand grouped datasets more. Grouped datasets unlock a flexible and scalable way to manage related samples within complex computer vision and multimodal ML pipelines. By combining grouped slices with FiftyOne’s view stages, filtering capabilities, and aggregation framework, you can efficiently query and analyze relationships across multiple perspectives of the same scene or sample group.
As your datasets grow in complexity, grouped datasets provide a clean abstraction for organizing and exploring related data while retaining the full flexibility of the FiftyOne’s open source dataset API. In future tutorials, we will build on these concepts by exploring dynamic grouped datasets and more advanced customization techniques for large-scale computer vision data management. Explore what’s possible with grouped datasets in FiftyOne by applying these workflows to your own computer vision pipelines and multiview data.
Loading related posts...