Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on
Slack,
GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
Ok, let’s dive into this week’s tips and tricks!
Efficient batch edits
Community Slack member Andrew asked:
Is there a way to apply a function to all samples in a dataset? Something equivalent to [some_func(x.gt_classifications.classifications) for x in some_dataset.samples]
, but hopefully a lot faster?
Using the values()
function should speed things up considerably. use values()
extracts the slice of data you wish to modify and then set_values()
saves the updated data in a single batch operation. For example:
Parallelizing image loads
Community Slack member Kevin asked:
I am running inference over my image dataset. I am using a simple approach:
My images are stored in a remote server so fetching the images takes a considerable amount of time. The iteration speed is around 5 samples per second which is very slow for my purposes. Is there any way to parallelise this?
Without modifying how you are accessing your images over the network, one approach would be to use a
torch Dataset
and
Dataloader
, where the latter can be parallelized and yield single images or batches. The FiftyOne GitHub repo has
some utilities for out of the box use, but you can also easily customize them for your specific needs.
A final optimization can include avoiding
sample.save()
calls which can create a bottleneck. You can try
iter_samples(autosave=True)
when making sample modifications. This will efficiently batch sample saves.
Excluding samples with certain labels
Community Slack member Dan asked:
Is there a way to filter labels, but keep samples that don’t match a specific annotation? For example, I have a bunch of street images with cars and people. I want to see a view of the dataset that contains all the images, except for cars and people.
Yes. For example:
Listing view stages and combining them
Community Slack member Guillaume asked:
Let’s assume I have a dataset, with two saved views a
and b
, with one or more view stages each. Is there a way to get back the history of view stages from each view? If yes, is it possible to combine them, e.g. get the union or intersection?
Yes! You can call
view._stages
to see the list of stages that comprise each view and the kwargs used to define them. Also, you can learn more about how to concatenate views
here.Disabling outputs in a Jupyter notebook
Community Slack member Nisha asked:
How can I disable the outputs displayed when loading datasets in a Jupyter notebook? For example:
You can set fo.config.show_progress_bars=False
Join the FiftyOne Community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!