Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Efficient batch edits
Community Slack member Andrew asked:
Is there a way to apply a function to all samples in a dataset? Something equivalent to
[some_func(x.gt_classifications.classifications) for x in some_dataset.samples]
, but hopefully a lot faster?
Using the values()
function should speed things up considerably. use values()
extracts the slice of data you wish to modify and then set_values()
saves the updated data in a single batch operation. For example:
# Load all predicted detections # This is a list of lists of `Detection` instances for each sample detections = dataset.values("predictions.detections") # Add a tag to all low confidence detections for sample_detections in detections: for detection in sample_detections: if detection.confidence < 0.06: detection.tags.append("low_confidence") # Save the updated predictions dataset.set_values("predictions.detections", detections)
Check out the FiftyOne Docs to learn more about batch updates.
Parallelizing image loads
Community Slack member Kevin asked:
I am running inference over my image dataset. I am using a simple approach:
for sample in tqdm(dataset): img = cv2.imread(sample.filepath) proposals = yolo.detect(img) fo_detections = proposals_to_fo_detections(proposals) sample[yolo.name] = fo_detections sample.save()
My images are stored in a remote server so fetching the images takes a considerable amount of time. The iteration speed is around 5 samples per second which is very slow for my purposes. Is there any way to parallelise this?
Without modifying how you are accessing your images over the network, one approach would be to use a torch Dataset
and Dataloader
, where the latter can be parallelized and yield single images or batches. The FiftyOne GitHub repo has some utilities for out of the box use, but you can also easily customize them for your specific needs.
If you are using an out of the box yolo
model, you may want to consider registering your model with the zoo and using FiftyOne’s apply_model
or leveraging FiftyOne’s native Ultralytics integration.
A final optimization can include avoiding sample.save()
calls which can create a bottleneck. You can try iter_samples(autosave=True)
when making sample modifications. This will efficiently batch sample saves.
Excluding samples with certain labels
Community Slack member Dan asked:
Is there a way to filter labels, but keep samples that don’t match a specific annotation? For example, I have a bunch of street images with cars and people. I want to see a view of the dataset that contains all the images, except for cars and people.
Yes. For example:
ds.match(~F("ground_truth.detections.label").contains(["car", "person"], all=False))
Check out this filtering cheat sheet for additional tips and tricks.
Listing view stages and combining them
Community Slack member Guillaume asked:
Let’s assume I have a dataset, with two saved views
a
andb
, with one or more view stages each. Is there a way to get back the history of view stages from each view? If yes, is it possible to combine them, e.g. get the union or intersection?
Yes! You can call view._stages
to see the list of stages that comprise each view and the kwargs used to define them. Also, you can learn more about how to concatenate views here.
Disabling outputs in a Jupyter notebook
Community Slack member Nisha asked:
How can I disable the outputs displayed when loading datasets in a Jupyter notebook? For example:
100% |█████████████████████| 6/6 [26.6ms elapsed, 0s remaining, 225.3 samples/s] Connected to FiftyOne on port 5051 at localhost.
You can set fo.config.show_progress_bars=False
Check out the FiftyOne Docs to learn more about FiftyOne default and custom configurations.
Join the FiftyOne Community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 2,000+ FiftyOne Slack members
- 4,000+ stars on GitHub
- 5,000+ Meetup members
- Used by 370+ repositories
- 60+ contributors