Skip to content

FiftyOne Computer Vision Tips and Tricks – Nov 3, 2023

Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Ok, let’s dive into this week’s tips and tricks!

Fast filepath replacements

Community Slack member Joy asked:

I want to change a single folder on a find and replace basis as quickly as possible. How would one perform a batch replace of filepaths for samples in a dataset?

Try tweaking to taste the following:

dataset.set_field("filepath", F("filepath").replace("/my/path/to/remove", "/new/path/to/add")).save()

This code avoids needing to pull filepaths into memory which will be much faster for large datasets. You can also apply this on a view instead of the full dataset if you want.

Changing the datatype of a field

Community Slack member Tyler asked:

How can I change the datatype of a field? I am attempting to merge three datasets. However, when I do so I get this error:

ValueError: Field 'id' type fiftyone.core.fields.IntField does not match existing field type fiftyone.core.fields.StringField

id is a reserved StringField for datasets, so its type cannot be converted. But as a general example, converting a field type can be done with an intermediary field. For example:

import fiftyone as fo
import fiftyone.zoo as foz

F = fo.ViewField

dataset = foz.load_zoo_dataset("quickstart", max_samples=5).clone("convert-example")
for i, sample in enumerate(dataset):
    sample["i"] = i
    sample.save()

print("BEFORE", dataset.values("i"))

dataset.add_sample_field("str_i", fo.StringField)
dataset.set_field("str_i", F("i").to_string()).save()
dataset.delete_sample_field("i")
dataset.rename_sample_field("str_i", "i")

print("AFTER", dataset.values("i"))

Grouping by multiple sample fields

Community Slack member HT asked:

Is it possible to group_by on multiple sample fields? For example, I have the following sample fields: objectId, frame and band. I want to group by both objectId and frame so that the group contains multiple images of different band.

You can achieve it like this:

import fiftyone as fo
from fiftyone import ViewExpression as E, ViewField as F

dataset = fo.Dataset()
dataset.add_samples(
    [
        fo.Sample(filepath="image1.jpg", a=1, b=1),
        fo.Sample(filepath="image2.jpg", a=2, b=1),
        fo.Sample(filepath="image3.jpg", a=1, b=2),
        fo.Sample(filepath="image4.jpg", a=2, b=2),
        fo.Sample(filepath="image5.jpg", a=1, b=1),
        fo.Sample(filepath="image6.jpg", a=2, b=2),
    ]
)

view = dataset.group_by(E([F("a"), F("b")]))

for group in view.iter_dynamic_groups():
    sample = group.first()
    print("a=%d, b=%d: %d" % (sample.a, sample.b, len(group)))

Sorting a dataset’s string fields in numeric order

Community Slack member Ritesh asked:

I need to sort a dataset with string fields by numbers, but when using sort_by, the sorting works in lexicographical order (0,1,10,11,12,13,2,3,4 etc). How can I enforce normal sorting like (0,1,2,3 etc)?

Here’s how you can sort a string field in numeric order:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
dataset.add_samples(
    [
        fo.Sample(filepath="image1.jpg", int=1, str="1"),
        fo.Sample(filepath="image2.jpg", int=2, str="2"),
        fo.Sample(filepath="image3.jpg", int=10, str="10"),
    ]
)

view = dataset.sort_by("int")
print(view.values("int"))  # [1, 2, 10]

view = dataset.sort_by("str")
print(view.values("str"))  # ['1', '10', '2']

view = dataset.sort_by(F("str").to_int())
print(view.values("str"))  # ['1', '2', '10']

How to efficiently merge samples

Community Slack member Abhiroop asked:

I am trying to merge samples from multiple datasets into a single dataset and was wondering if there was a faster way to do this than using add_samples() to add samples from each dataset to a merged dataset. I have 10 datasets with each one containing 2000 samples. I’m currently using add_samples() to iterate over the 10 datasets to add the samples to a single dataset. It takes around 3 minutes to add 2000 samples to the merged dataset, so this merging process ends up taking around 30 minutes. Is there a faster way of accomplishing this?

Best bet will be to try merge_samples(). In FiftyOne, merging datasets is an easy way to:

  • Combine multiple datasets with information about the same underlying raw media (images and videos)
  • Add model predictions to a FiftyOne dataset, to compare with ground truth annotations and/or other models

You can learn more about merging datasets in FiftyOne Docs.

Join the FiftyOne Community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!