Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Fast filepath replacements
Community Slack member Joy asked:
I want to change a single folder on a find and replace basis as quickly as possible. How would one perform a batch replace of filepaths for samples in a dataset?
Try tweaking to taste the following:
dataset.set_field("filepath", F("filepath").replace("/my/path/to/remove", "/new/path/to/add")).save()
This code avoids needing to pull filepaths into memory which will be much faster for large datasets. You can also apply this on a view instead of the full dataset if you want.
Changing the datatype of a field
Community Slack member Tyler asked:
How can I change the datatype of a field? I am attempting to merge three datasets. However, when I do so I get this error:
ValueError: Field 'id' type fiftyone.core.fields.IntField does not match existing field type fiftyone.core.fields.StringField
id
is a reserved StringField
for datasets, so its type cannot be converted. But as a general example, converting a field type can be done with an intermediary field. For example:
import fiftyone as fo import fiftyone.zoo as foz F = fo.ViewField dataset = foz.load_zoo_dataset("quickstart", max_samples=5).clone("convert-example") for i, sample in enumerate(dataset): sample["i"] = i sample.save() print("BEFORE", dataset.values("i")) dataset.add_sample_field("str_i", fo.StringField) dataset.set_field("str_i", F("i").to_string()).save() dataset.delete_sample_field("i") dataset.rename_sample_field("str_i", "i") print("AFTER", dataset.values("i"))
Grouping by multiple sample fields
Community Slack member HT asked:
Is it possible to
group_by
on multiple sample fields? For example, I have the following sample fields:objectId
,frame
andband
. I want to group by bothobjectId
andframe
so that the group contains multiple images of differentband
.
You can achieve it like this:
import fiftyone as fo from fiftyone import ViewExpression as E, ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample(filepath="image1.jpg", a=1, b=1), fo.Sample(filepath="image2.jpg", a=2, b=1), fo.Sample(filepath="image3.jpg", a=1, b=2), fo.Sample(filepath="image4.jpg", a=2, b=2), fo.Sample(filepath="image5.jpg", a=1, b=1), fo.Sample(filepath="image6.jpg", a=2, b=2), ] ) view = dataset.group_by(E([F("a"), F("b")])) for group in view.iter_dynamic_groups(): sample = group.first() print("a=%d, b=%d: %d" % (sample.a, sample.b, len(group)))
Sorting a dataset’s string fields in numeric order
Community Slack member Ritesh asked:
I need to sort a dataset with string fields by numbers, but when using
sort_by
, the sorting works in lexicographical order (0,1,10,11,12,13,2,3,4 etc). How can I enforce normal sorting like (0,1,2,3 etc)?
Here’s how you can sort a string field in numeric order:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.Dataset() dataset.add_samples( [ fo.Sample(filepath="image1.jpg", int=1, str="1"), fo.Sample(filepath="image2.jpg", int=2, str="2"), fo.Sample(filepath="image3.jpg", int=10, str="10"), ] ) view = dataset.sort_by("int") print(view.values("int")) # [1, 2, 10] view = dataset.sort_by("str") print(view.values("str")) # ['1', '10', '2'] view = dataset.sort_by(F("str").to_int()) print(view.values("str")) # ['1', '2', '10']
How to efficiently merge samples
Community Slack member Abhiroop asked:
I am trying to merge samples from multiple datasets into a single dataset and was wondering if there was a faster way to do this than using
add_samples()
to add samples from each dataset to a merged dataset. I have 10 datasets with each one containing 2000 samples. I’m currently usingadd_samples()
to iterate over the 10 datasets to add the samples to a single dataset. It takes around 3 minutes to add 2000 samples to the merged dataset, so this merging process ends up taking around 30 minutes. Is there a faster way of accomplishing this?
Best bet will be to try merge_samples()
. In FiftyOne, merging datasets is an easy way to:
- Combine multiple datasets with information about the same underlying raw media (images and videos)
- Add model predictions to a FiftyOne dataset, to compare with ground truth annotations and/or other models
You can learn more about merging datasets in FiftyOne Docs.
Join the FiftyOne Community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 2,000+ FiftyOne Slack members
- 4,000+ stars on GitHub
- 5,000+ Meetup members
- Used by 370+ repositories
- 60+ contributors