Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Counting classes using the FiftyOne API
Community Slack member Nahid asked,
“Is there a way to show the count of different classes in a YOLOv5 dataset with the FiftyOne API programmatically?”
Although this can be easily done in the UI, you get a similar result with the API with something like:
import fiftyone as fo name = "my_dataset" dataset_dir = "/home/ec2-user/my-dataset-top-dir" # Create the dataset dataset = fo.Dataset.from_dir( dataset_dir=dataset_dir, dataset_type=fo.types.YOLOv5Dataset, name=fo.core.dataset.make_unique_dataset_name(name), split = 'test' ) print(dataset) counts = dataset.count_values("ground_truth.detections.label") print(counts)
For more information about the count_values
function, check out the Docs.
Adding predictions to videos
Community Slack member Stan asked,
“I want to use FiftyOne to add predictions to all the videos in my video dataset. Is there a way around extracting the frames first or having to rematch the frames to the video in the video dataset?”
There are two potential patterns you could build off of. The first one is applicable if you are using a custom model.
from collections import defaultdict import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-video") # Sample per-frame images on disk to feed to your model frames = dataset.to_frames(sample_frames=True) # Load your model here model = ... # Option 1: save one prediction at a time on `frames` # Note that when you add fields to `frames`, they will appear on the frames of `dataset` for frame in frames: frame["predictions"] = model.predict(frame.filepath) frame.save() # Option 2: save batches of predictions directly on `dataset` values = frames.values(["sample_id", "frame_number", "filepath"]) predictions = defaultdict(dict) for sample_id, frame_number, filepath in zip(*values): predictions[sample_id][frame_number] = model.predict(filepath) dataset.set_values("frames.predictions", predictions, key_field="id")
The next one will work if you are using a model from the FiftyOne Model Zoo.
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset("quickstart-video") model = foz.load_zoo_model("...") # If `model` is an image model, this automatically infers that you # want to run inference on the frames of the videos dataset.apply_model(model, label_field="predictions")
Learn more about working with video datasets, model predictions, and the FiftyOne Model Zoo in the Docs.
Deleting samples with uniqueness less than some value
Community Slack member ZKW asked,
“Is there a method that would allow me to delete samples with uniqueness less than 0.2 in a dataset?”
Here is one way to get the job done:
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # Option 1 good_view = dataset.match(F("uniqueness") > 0.2) good_view.keep() # Option 2 bad_view = dataset.match(F("uniqueness") <= 0.2) dataset.delete_samples(bad_view)
For more information about FiftyOne’s capabilities for computing uniqueness, check out the Docs and the exploring image uniqueness tutorial.
Adding a VOC label to a sample
Community Slack member ht asked,
“I have an existing dataset in the following structure:
├── labels ├── left └── right
With labels
containing .xml file in VOC format (boundingbox) and left
and right
containing images from the left and right view. I would like to create a GroupDataset
. How can I add the VOC label to each sample?”
Check out the fiftyone.utils.voc.VOCAnnotation
class representing a VOC annotations file.
voc_annotation = VOCAnnotation(path="path/to/label") detections = voc_annotation.to_detections() sample.set_field("detection", detections)
For more information about utilities for working with datasets in VOC format, check out the Docs.
Getting started with model predictions in FiftyOne
Community Slack member NB asked,
“I am new to FiftyOne and exploring how to use it in my model training pipeline which uses TensorFlow Lite. How can I get started?”
Model predictions stored in other formats can always be loaded iteratively through a simple Python loop. The example below shows how to add object detection predictions to a dataset, but plenty of other label types are also supported.
import fiftyone as fo # Ex: your custom predictions format predictions = { "/path/to/images/000001.jpg": [ {"bbox": ..., "label": ..., "score": ...}, ... ], ... } # Add predictions to your samples for sample in dataset: filepath = sample.filepath # Convert predictions to FiftyOne format detections = [] for obj in predictions[filepath]: label = obj["label"] confidence = obj["score"] # Bounding box coordinates should be relative values # in [0, 1] in the following format: # [top-left-x, top-left-y, width, height] bounding_box = obj["bbox"] detections.append( fo.Detection( label=label, bounding_box=bounding_box, confidence=confidence, ) ) # Store detections in a field name of your choice sample["predictions"] = fo.Detections(detections=detections) sample.save()
More information about working with model predictions, check out the Adding classifier predictions to a dataset and Model predictions sections of the Docs.
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 1,600+ FiftyOne Slack members
- 2,950+ stars on GitHub
- 4,000+ Meetup members
- Used by 290+ repositories
- 58+ contributors