Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

Ok, let’s dive into this week’s tips and tricks!

Performing semantic video search

Community Slack member Yashovardhan asked:

How can I do an embedding search on videos that are groups of image sequences?

Check out the Semantic Video Search plugin for FiftyOne that streamlines this process for you. With a single prompt, you can find exactly what you are looking for across every frame in your dataset!

Check out dozens of other plugins to help streamline your workflows.

Working with grouped datasets

Community Slack member Gantugs asked:

Is there a way to show images with five channels? For example microscopic images?

Yes! The way to accomplish this is to create a grouped dataset. In FiftyOne, grouped datasets contain multiple slices of samples of possibly different modalities (image, video, or point cloud) that are organized into groups. These grouped datasets can be used to represent multiview scenes, where data for multiple perspectives of the same scene can be stored, visualized, and queried in ways that respect the relationships between the slices of data.

Uniqueness of sample ids

Community Slack member Edward asked:

Is a sample_id unique for each sample regardless of which dataset or FiftyOne version the sample is from? For example, if I have multiple datasets with millions of samples, will there be a chance that a sample_id will be the same for two samples from different datasets, or will they always be unique?

FiftyOne’s sample_id is a UUID in the strictest sense. FiftyOne ensures that these IDs are universally unique within a given dataset. However, be aware that samples can have the same ID between datasets. For example, if you clone a dataset:

Specifying a color scheme for your dataset

Community Slack member Nadu asked:

I'm trying to visualize images and their segmentation masks. How can I specify that each class corresponds to a specific color?

You can configure the color scheme used by the FiftyOne App to render content by clicking on the color palette icon above the sample grid. The GIF below demonstrates how to:

Configure a custom color pool from which to draw colors for otherwise unspecified fields/values
Configure the colors assigned to specific fields in color by field mode
Configure the colors used to render specific annotations based on their attributes in color by value mode
Save the customized color scheme as the default for the dataset

Configuring persistent dataset options

Community Slack member Gantugs asked:

When using persistent datasets, where are they saved by default? Is there a way to configure that?

Recall that all of the FiftyOne data is being stored in a MongoDB backend. By default it stores the data in ~/.fiftyone/var/lib/mongo

Check out the “Configuring FiftyOne” section of the Docs. Here you find the configuration options that are available. You can change the database_dir to a different drive or location and even set the database_uri.

Talk to a computer vision expert