Welcome to the weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Customizing the visualization of object embeddings
Community Slack member Joy asked,
“Could you please provide an example of how to restrict the visualization to only objects in a subset of classes using the compute_visualization() method’s default embeddings model and dimensionality method?”
Hey Joy! To visualize patch embeddings and color them by their label, you need to choose a brain_key
that corresponds to patch embeddings. You can use the patches_field="ground_truth"
argument to embed the patches defined by the ground_truth field instead of the entire images.
The following example shows how to restrict the visualization to only objects in a subset of the classes. To restrict the visualization to only objects in a subset of the classes, each point in the scatter plot in the Embeddings panel corresponds to an object, colored by its label class. When points are lassoed in the plot, the corresponding object patches are automatically selected in the Samples panel.
import fiftyone as fo import fiftyone.brain as fob import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # Generate visualization for `ground_truth` objects results = fob.compute_visualization( dataset, patches_field="ground_truth", brain_key="vj_viz" ) # Restrict to the 10 most common classes counts = dataset.count_values("ground_truth.detections.label") classes = sorted(counts, key=counts.get, reverse=True)[:10] view = dataset.filter_labels("ground_truth", F("label").is_in(classes)) session = fo.launch_app(view)
Once the session is launched, you can use the embeddings panel and filter down the ground truth labels by applying the brain key as demonstrated here:
Learn more about object embeddings in the FiftyOne docs.
Using FiftyOne to copy predictions over to ground truth for CVAT labels
Community Slack member Kais asked,
“In my dataset, I have two fields, ground_truth and predictions. For a number of samples, I want to copy the content of ‘predictions’ over to ‘ground_truth’, so that the labels that I create in CVAT get saved under ‘ground_truth’. How can I approach it in FiftyOne?”
Hey Kais, thanks for your question! It sounds like you want to copy the content of predictions
over to ground_truth
for some samples in your dataset, so that the labels you create in CVAT get saved under ground_truth
.
There is a tight integration between FiftyOne and CVAT that is designed to manage the full annotation workflow, from task creation to annotation import. However, if you have created CVAT tasks outside of FiftyOne, you can use the import_annotations() utility to import individual task(s) or an entire project into a FiftyOne dataset.
First, you need to create a pre-existing CVAT project outside of FiftyOne. Once you have that, you can use ‘annotate()` to add the project to your dataset. Then, you can use import_annotations()
to import the annotations and specify the name of the field into which to load your labels:
fouc.import_annotations(dataset, project_name=project_name, label_types={"detections": "ground_truth"})
Make sure to specify the correct project_name
and label_types
. You can also download both the annotations and the media from CVAT by setting download_media=True
.
Once you’ve imported the annotations, you can launch the FiftyOne App with fo.launch_app(dataset)
to view and edit your labeled samples.
The code below demonstrates the functionality discussed in the response.
import os import fiftyone as fo import fiftyone.utils.cvat as fouc import fiftyone.zoo as foz # Load a FiftyOne dataset with some samples dataset = foz.load_zoo_dataset("quickstart", max_samples=3).clone() # Create a pre-existing CVAT project with some annotations project_name = "my_cvat_project" # Replace with your project name cvat_annotations_path = "/path/to/cvat_annotations.xml" # Replace with the path to your CVAT annotations file # Add the project to your dataset using annotate() results = dataset.annotate(project_name, label_field="ground_truth", annotation_backend="cvat", annotation_filepath=cvat_annotations_path) # Import the annotations into your FiftyOne dataset using import_annotations() fouc.import_annotations(dataset, project_name=project_name, label_types={"detections": "ground_truth"}, data_path="/tmp/cvat_import", download_media=True) # Copy the content of 'predictions' over to 'ground_truth' for some samples for sample in dataset.take(2): # Replace with the samples you want to modify sample.ground_truth = sample.predictions sample.save() # Launch the FiftyOne app to view and edit your labeled samples session = fo.launch_app(dataset)
Learn more about CVAT integration in the FiftyOne Docs.
Adding detections to a video dataset
Community member Kevin asked,
“I have a video dataset and I am using a Python loop to perform the necessary edits. I would like to add detections to each sample (video). What is the right approach to add detections to video samples in FiftyOne?“
Hey Kevin! In FiftyOne, adding detections to video samples is done by using the frames attribute of each video sample. Video samples are recognized by their MIME type and have media type video in FiftyOne datasets.
The frames attribute is a dictionary whose keys are frame numbers and values are instances of the frame class. Frame instances can hold all label
instances and other primitive-type fields for the frame. To add, modify, or delete labels of any type as well as primitive fields such as integers, strings, and booleans, you can use the same dynamic attribute syntax that you use to interact with samples.
Here’s an example code snippet that demonstrates how to add a detection to a frame in a video sample:
import fiftyone as fo # Load the dataset containing video samples dataset = fo.load_dataset('path/to/dataset', 'video') # Iterate over the video samples for sample in dataset: # Get the frame number and add a detection to it frame_number = 10 detection = fo.Detection(label='person', bounding_box=[0, 0, 100, 100]) sample.frames[frame_number].detections.append(detection) # Save the changes to the database sample.save()
In this example, we load the dataset containing video samples and iterate over each sample. For each sample, we add a detection to the frame with frame number 10. The detection consists of a label and a bounding box. Finally, we save the changes to the database by calling `sample.save()`.
Learn more about adding object detection to datasets in the FiftyOne docs.
How to split data into train, validation, and test sets in FiftyOne and export them separately
Community member Eli asked,
“What is the process for splitting a dataset into train, validation, and test sets in FiftyOne, and how can each split be exported separately to its own directory?”
Hey Eli, to split your dataset into train, validation, and test sets, you can use the dataset.split()
method in FiftyOne. You can specify the name and percentage of the samples to include in each split, and then export them separately as shown below:
import fiftyone as fo import fiftyone.zoo as foz import fiftyone.utils.random as four import fiftyone.utils.yolo as fouy dataset = foz.load_zoo_dataset("quickstart") four.random_split(dataset, {"val": 0.6, "test": 0.4}) val_view = dataset.match_tags("val") test_view = dataset.match_tags("test") val_view.export( export_dir="val_export", dataset_type=fo.types.YOLOv5Dataset, classes=classes ) test_view.export( export_dir="test_export", dataset_type=fo.types.YOLOv5Dataset, classes=classes )
The following image shows the directory structure after running the code.
And here’s the metadata information of the output.
Learn more about various export methods in the FiftyOne docs.
Keeping a persistent view of dataset for multiple sessions
Community member Dan asked,
“I used dataset.clone() to create a clone of a dataset. But after closing my session debugger and re-running it, the dataset was no longer present. What could be the reason for this?”
Hey Dan! It looks like when you cloned the dataset in FiftyOne you did not make it persistent. This means it was not saved to the database and was deleted once the session was closed.
By default, datasets are non-persistent. Non-persistent datasets are deleted from the database each time the database is shut down. Note that FiftyOne does not store the raw data in datasets directly (only the labels), so your source files on disk are untouched. You can make a cloned dataset persistent by setting its persistent property to True
before saving it to the database.
To do this, you can add the following code after cloning the dataset:
cloned_dataset.persistent = True cloned_dataset.save()
Finally, you can check to see what datasets exist at any time via list_datasets().
print(fo.list_datasets())
To check whether a given dataset is persistent, load it with dataset = fo.load_dataset(“my_dataset”)
, and then print out dataset.persistent
.
Learn more about data persistence and datasets in the FiftyOne Docs.
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!
- 1,500+ FiftyOne Slack members
- 2,750+ stars on GitHub
- 3,500+ Meetup members
- Used by 265+ repositories
- 55+ contributors