Skip to content

FiftyOne Computer Vision Tips and Tricks – July 28, 2023

Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.

As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Ok, let’s dive into this week’s tips and tricks!

Listing Dataset Views and exporting them

Community Slack Sabrina Pereira asked:

“How can I see a list of saved views in a dataset? I made some views using the UI and now want to export them as datasets in the COCO format. How can I do that?

dataset.list_saved_views() will list out your saved views. For context, if you find yourself frequently using/recreating certain views, you can use save_view() to save them on your dataset under a name of your choice:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart")
dataset.persistent = True

# Create a view
cats_view = (
    dataset
    .select_fields("ground_truth")
    .filter_labels("ground_truth", F("label") == "cat")
    .sort_by(F("ground_truth.detections").length(), reverse=True)
)

# Save the view
dataset.save_view("cats-view", cats_view)

Then you can conveniently use load_saved_view() to load the view in a future session:

import fiftyone as fo

dataset = fo.load_dataset("quickstart")

# Retrieve a saved view
cats_view = dataset.load_saved_view("cats-view")
print(cats_view)

Besides list_saved_views(), you can use has_saved_view(), and delete_saved_view()to manage your saved views.

FiftyOne provides native support for exporting datasets to disk in a variety of common formats, and it can be easily extended to export datasets in custom formats.

You can easily export entire datasets as well as arbitrary subsets of your datasets that you have identified by constructing a DatasetView into any format of your choice via the basic recipe below.

import fiftyone as fo

# The Dataset or DatasetView containing the samples you wish to export
dataset_or_view = fo.Dataset(...)

# The directory to which to write the exported dataset
export_dir = "/path/for/export"

# The name of the sample field containing the label that you wish to export
# Used when exporting labeled datasets (e.g., classification or detection)
label_field = "ground_truth"  # for example

# The type of dataset to export
# Any subclass of `fiftyone.types.Dataset` is supported
dataset_type = fo.types.COCODetectionDataset  # for example

# Export the dataset
dataset_or_view.export(
    export_dir=export_dir,
    dataset_type=dataset_type,
    label_field=label_field,
)

Check out the FiftyOne Docs to learn more about working with saved views and exporting datasets.

Exporting subsets of images with tags

Community Slack member Samuel asked:

“What is the recommended way to export images that I tag in the FiftyOne App (without exporting the whole dataset)? In my script (below), I have some code after session.wait() that parses the dataset tags (by iterating through samples, and accessing sample.tags), then saves them as a JSON. However, after I tag samples in the App, they don’t show up in the dataset object after the session. Invariably, sample.tags and fo_dset.tags are empty for all samples.”

# Create dataset
    fo_dset, _ = build_dataset(cfg, run_md_df, image_md_df, fo_vis_annos)

    if cfg.dataset_save:
        fo_dset.save()
    session = fo.launch_app(fo_dset)
    session.wait()

    tag_dict = {}
    for sample in fo_dset:
        tags = sample.tags
        for tag in tags:
            print(tag)
            if tag in tag_dict.keys():
                tag_dict[tag].append(sample["frame_id"])
            else:
                tag_dict[tag] = [sample["frame_id"]]

    tag_output_file = os.path.join(cfg.data_root, "{}_tags.json".format(
        remove_special_characters(cfg.project_name).lower().replace(" ", "_")))
    with open(tag_output_file, "w") as f:
        json.dump(tag_dict, f)

When you apply tags in the FiftyOne App, you may need to call dataset.reload() to force any in-memory samples in your main Python session to pull in the changes you made in the App. This would only be necessary if you are actually holding references to Sample objects in-memory in Python. Any Dataset-level methods like count_sample_tags() will always pull data from the database. So they’ll always be up-to-date without needing to call reload() first. In your case you could efficiently access the relevant data as follows:

tags, frame_ids = dataset.values([“tags”, “frame_id”])

Check out the FiftyOne Docs to learn more about working with tags and samples.

Importing datasets in custom formats

Community Slack member Hendrik asked:

“I have two million images with JSON metadata attached. The metadata doesn’t  follow a standard like COCO, but instead is in a custom format. Is it possible to import my images into FiftyOne and make them searchable by specific fields in the metadata?”

Yes! The simplest and most flexible approach to loading your data into FiftyOne is to iterate over your data in a simple Python loop, create a Sample for each data + label(s) pair, and then add those samples to a Dataset.

FiftyOne provides label types for common tasks such as classification, detection, segmentation, and many more. Here’s an image classification example to give you a sense of the basic workflow.

import glob
import fiftyone as fo

images_patt = "/path/to/images/*"

# Ex: your custom label format
annotations = {
    "/path/to/images/000001.jpg": "dog",
    ....,
}

# Create samples for your data
samples = []
for filepath in glob.glob(images_patt):
    sample = fo.Sample(filepath=filepath)

    # Store classification in a field name of your choice
    label = annotations[filepath]
    sample["ground_truth"] = fo.Classification(label=label)

    samples.append(sample)

# Create dataset
dataset = fo.Dataset("my-classification-dataset")
dataset.add_samples(samples)

Check out the FiftyOne Docs to learn more about custom formats, label types, and writing custom dataset importers.

Finding FiftyOne Brain keys and deleting runs

Community Slack member Samuel asked:

“I’m having trouble wrangling FiftyOne Brain keys. I’m often recomputing embeddings and similarity, but don’t want to have to specify a new name every time. So, I use a default brain key name like:

fob.compute_similarity(dset, embeddings=embeddings, brain_key="img_embed_sim")

However, this leads to frequent errors: 

"ValueError: Brain method run with key 'img_embed_sim' already exists" 

I was trying to delete the existing Brain method when this happens, but I can’t find the method called “img_embed_sim” anywhere in my dataset. Any ideas on how I can resolve this? Also if I don’t specify a Brain key, I don’t seem to be able to sort by similarity in the App.”

You must provide a brain_key in order for the run to be persisted permanently on your dataset and accessible in the FiftyOne App. So, that part of your question is by design. You can then use dataset.list_brain_runs() to see the existing Brain keys on your dataset to avoid duplicates. This key needs to be something descriptive of what the run contains so that you know what you’re choosing when you access the run from the App later.

The relevant “delete” methods to consider are:

Check out the FiftyOne Docs to learn more about the capabilities of the FiftyOne Brain.

Importing images with nested directories

Community Slack member Maor asked:

“How do I upload 10k images where the path contains directories with directories with images?”

Assuming you want the directories to be imported as tags, you can try:

dataset_dir = "path/to/dataset"
samples = []
for filepath in fos.list_files(dataset_dir, abs_paths=True, recursive=True):
    try:
        tags = os.path.relpath(os.path.dirname(filepath), dataset_dir).split("/")
        sample = fo.Sample(filepath=filepath, tags=tags)
        samples.append(sample)
    except Exception as e:
        print("Error with file: ", filepath, e)
dataset.add_samples(samples)
print("done")

Check out the FiftyOne Docs to learn more about loading datasets from disk.

Join the FiftyOne community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!