Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

If you like what you see on GitHub, give the project a star.
Get started! We’ve made it easy to get up and running in a few minutes.
Join the FiftyOne Slack community, we’re always happy to help.

Ok, let’s dive into this week’s tips and tricks!

Fast filepath replacements

Community Slack member Joy asked:

I want to change a single folder on a find and replace basis as quickly as possible. How would one perform a batch replace of filepaths for samples in a dataset?

Try tweaking to taste the following:

This code avoids needing to pull filepaths into memory which will be much faster for large datasets. You can also apply this on a view instead of the full dataset if you want.

Changing the datatype of a field

Community Slack member Tyler asked:

How can I change the datatype of a field? I am attempting to merge three datasets. However, when I do so I get this error:

id is a reserved StringField for datasets, so its type cannot be converted. But as a general example, converting a field type can be done with an intermediary field. For example:

Grouping by multiple sample fields

Community Slack member HT asked:

Is it possible to group_by on multiple sample fields? For example, I have the following sample fields: objectId, frame and band. I want to group by both objectId and frame so that the group contains multiple images of different band.

You can achieve it like this:

Sorting a dataset’s string fields in numeric order

Community Slack member Ritesh asked:

I need to sort a dataset with string fields by numbers, but when using sort_by, the sorting works in lexicographical order (0,1,10,11,12,13,2,3,4 etc). How can I enforce normal sorting like (0,1,2,3 etc)?

Here's how you can sort a string field in numeric order:

How to efficiently merge samples

Community Slack member Abhiroop asked:

I am trying to merge samples from multiple datasets into a single dataset and was wondering if there was a faster way to do this than using add_samples() to add samples from each dataset to a merged dataset. I have 10 datasets with each one containing 2000 samples. I'm currently using add_samples() to iterate over the 10 datasets to add the samples to a single dataset. It takes around 3 minutes to add 2000 samples to the merged dataset, so this merging process ends up taking around 30 minutes. Is there a faster way of accomplishing this?

Best bet will be to try merge_samples(). In FiftyOne, merging datasets is an easy way to:

Combine multiple datasets with information about the same underlying raw media (images and videos)
Add model predictions to a FiftyOne dataset, to compare with ground truth annotations and/or other models

You can learn more about merging datasets in FiftyOne Docs.

Join the FiftyOne Community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!

2,000+ FiftyOne Slack members
4,000+ stars on GitHub
5,000+ Meetup members
Used by 370+ repositories
60+ contributors

Talk to a computer vision expert