Skip to content

Loading Open Images V6 and Custom Datasets with FiftyOne

Datasets and their annotations are often stored in very different formats. FiftyOne allows for easy loading and visualization of any image dataset and labels.

DataFrames are a standard way of storing tabular data with various tools that exist to visualize the data in different ways. Image and video datasets, on the other hand, do not have a standard format for storing their data and annotations. Nearly every dataset that is developed creates a new schema with which to store their raw data, bounding boxes, sample-level labels, etc.

I have been working on an open-source machine learning tool called FiftyOne that can help ease the pain of having to write custom loading, visualization, and conversion scripts whenever you use a new dataset. FiftyOne supports multiple dataset formats out of the box including MS-COCO, YOLO, Pascal VOC, and more. However, if you have a dataset format not provided out-of-the-box, you can still easily load it into FiftyOne manually.

Why would you want your data in FiftyOne? FiftyOne provides a highly functional App and API that will let you quickly visualize your dataset, generate interesting queries, find annotation mistakes, convert it to other formats, load it into a zoo of models, and more.

This blog post will walk you through how to load image-level classifications, object detections, segmentations, and visual relationships into FiftyOne, visualize them, and convert them to other formats. I’ll be using Open Images V6 which was released in February 2020 as a basis for this post since it contains all of these data types. If you are only interested in loading Open Images V6, you can check it out in the FiftyOne Dataset Zoo and load it in one line of code! If you have your own dataset that you want to load, adjust the code in this post to parse the format that your data is stored in.

Open Images V6

Open Images is a dataset released by Google containing over 9M images with labels spanning various tasks:

  • Image-level labels*
  • Object bounding boxes*
  • Visual relationships*
  • Instance segmentation masks*
  • Localized narratives

*Loaded in this post

These annotations were generated through a combination of machine learning algorithms followed by human verification on the test, validation, and subsets of the training splits. Versions of this dataset are also used in the Open Images Challenges on Kaggle.

Open Images V6 introduced localized narratives, which are a novel form of multimodal annotations consisting of a voiceover and mouse trace of an annotator describing an image. FiftyOne support for localized narratives is currently in the works.


A New Way to Download and Evaluate Open Images!

[Updated May 12, 2021] After releasing this post, we collaborated with Google to support Open Images V6 directly through the FiftyOne Dataset Zoo. It is now as easy as this to load Open Images, data, annotations, and all:

import fiftyone.zoo as foz

oi_dataset = foz.load_zoo_dataset("open-images-v6", split="validation")

With this implementation in FiftyOne, you can also specify any subset of Open Images with parameters like classessplitmax_samples, and more:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "open-images-v6", 
    "validation", 
    label_types=["detections", "classifications"], 
    classes = ["Dog", "Cat"],
    max_samples=1000,
    seed=51,
    shuffle=True,
    dataset_name="open-images-dog-cat",
)

session = fo.launch_app(dataset)

Additionally, if you are training a model on Open Images, FiftyOne now supports Open Images style evaluation allowing you to produce the same mAP metrics used in the Open Images challenges. The benefit of using FiftyOne for this is that it also stores instance-level true positive, false positive, and false negative results allowing you to not rely only on aggregate dataset-wide metrics but actually get hands-on with your model results and find out how to best improve performance.

results = fo.evaluate_detections(
    dataset,
    pred_field="predictions",
    gt_field="detections",
    pos_label_field="positive_labels",
    neg_label_field="negative_labels",
    hierarchy = dataset.info["hierarchy"],
    method = "open-images",
    expand_pred_hierarchy=True,
)

>> For more information check out this post or this tutorial!


Open Images Label Formats

The previous section shows the best way to load the Open Images dataset. However, FiftyOne also lets you easily load custom datasets. The next few sections show how to load a dataset into FiftyOne from scratch. We are using Open Images as the example dataset for this since it contains a rich variety of label types.

Note: The code in the following sections is meant to be adapted to your own datasets, it does not need to be used to load Open Images. Use the examples above if you are only interested in loading the Open Images dataset.

In this “Open Images Label Formats” section, we describe the format used by Google to store Open Images annotations on disk. We will use this information to write the parsers to load this dataset into FiftyOne in the next “Loading custom datasets into FiftyOne” section.

Downloading Data Locally

The AWS download links for the training split (513 GB), validation split (12 GB), and testing split (36 GB) can be found at Open Images GitHub repository. Annotations for the tasks that you are interested in can be downloaded directly from the Open Images website.

We will be using samples from the test split for this example. You can download the entire test split (36 Gb!) with the following commands:

pip install awscli
aws s3 --no-sign-request sync s3://open-images-dataset/test ./open-images/test/

Alternatively, I will be downloading just a few images from the test split further down in this post.

We will also need to download the relevant annotation files for each task that are all found here: https://storage.googleapis.com/openimages/web/download.html

Image-Level Labels

Every image in Open Images can contain multiple image-level labels across hundreds of classes. These labels are split into two types, positive and negative. Positive labels are classes that have been verified to be in the image while negative labels are classes that are verified to not be in the image. Negative labels are useful because they are generally specified for classes that you may expect to appear in a scene but do not. For example, if there is a group of people in outfits on a field, you may expect there to be a ball . If there isn’t one, that would be a good negative label.

wget -P labels https://storage.googleapis.com/openimages/v5/test-annotations-human-imagelabels-boxable.csv

Below is a sample of the contents of this file:

ImageID,Source,LabelName,Confidence
000026e7ee790996,verification,/m/0cgh4,0
000026e7ee790996,verification,/m/04hgtk,0
...

We need the class list for both labels and detections:

wget -P labels https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv

Below is a sample of the contents of this file:

/m/011k07,Tortoise
/m/011q46kg,Container
...
Image-level labels visualized in the FiftyOne App

Detections

Objects are localized and labeled with the same classes as the image-level labels. Additionally, each detection contains boolean attributes indicating if the object is occluded, truncated, representing a group of other objects, inside another object, or a depiction of the object (like a cartoon).

wget -P detections https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv

Below is a sample of the contents of this file:

ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
000026e7ee790996,xclick,/m/07j7r,1,0.071875,0.1453125,0.20625,0.39166668,0,1,1,0,0
000026e7ee790996,xclick,/m/07j7r,1,0.4390625,0.571875,0.26458332,0.43541667,0,1,1,0,0
...
Detections visualized in the FiftyOne App

Visual Relationships

Relationships are labeled between two object detections. Examples are if one object is wearing another. The most common relationship is is, indicating if an object is some attribute (like if a handbag is leather). The annotations for these relationships include the bounding boxes and labels of both objects as well as the label for the relationship.

wget -P relationships https://storage.googleapis.com/openimages/v6/oidv6-test-annotations-vrd.csv
wget -P relationships https://storage.googleapis.com/openimages/v6/oidv6-attributes-description.csv

Below is a sample of the contents of the relationships file:

ImageID,LabelName1,LabelName2,XMin1,XMax1,YMin1,YMax1,XMin2,XMax2,YMin2,YMax2,RelationshipLabel
9553b9608577b74b,/m/04yx4,/m/017ftj,0.023404,0.985106,0.038344,0.981595,0.238298,0.759574,0.349693,0.529141,wears
819903f6353b60b5,/m/03m3pdh,/m/0dnr7,0.096875,1.000000,0.095833,1.000000,0.096875,1.000000,0.095833,1.000000,is
...
Visual relationships visualized in the FiftyOne App

Instance Segmentation

Segmentation masks are downloaded through 16 zip files each containing the masks related to images starting with 0–9 or A-F. In this example, we will only be using images starting with 0. The following command downloads just those masks, replace the 0 with 1-9 or a-f to download masks for other images.

These segmentation annotations are stored in a separate image for each object and also include the bounding box coordinates around the segmentation and the label of the segmentation.

wget -P segmentations https://storage.googleapis.com/openimages/v5/test-masks/test-masks-0.zip
unzip -d segmentations/masks segmentations/test-masks-0.zip
wget -P segmentations https://storage.googleapis.com/openimages/v5/test-annotations-object-segmentation.csv

Below is a sample of the contents of the segmentations file:

MaskPath,ImageID,LabelName,BoxID,BoxXMin,BoxXMax,BoxYMin,BoxYMax,PredictedIoU,Clicks
d0ed76e0533a914d_m01xyhv_cffd8afa.png,d0ed76e0533a914d,/m/01xyhv,cffd8afa,0.122966,0.958409,0.389892,0.998195,0.00000,
fba940ee0203b368_m08pbxl_13094a08.png,fba940ee0203b368,/m/08pbxl,13094a08,0.460938,0.551562,0.387500,0.572917,0.00000,
...
Instance segmentations visualized in the FiftyOne App

Preprocessing

We are only going to use a small subset of the dataset in this example to make it easy to follow along with. Additionally, since we want to load a lot of different types of annotations, we need to find some samples that are compatible with all of our labels.

Let’s load in the annotations from the csv files we downloaded and parse them to find a subset of images we want to use.

import csv

with open("detections/test-annotations-bbox.csv") as f:
    reader = csv.reader(f, delimiter=',')
    det_data = [row for row in reader]

with open("labels/test-annotations-human-imagelabels-boxable.csv") as f:
    reader = csv.reader(f, delimiter=',')
    lab_data = [row for row in reader]

with open("relationships/oidv6-test-annotations-vrd.csv") as f:
    reader = csv.reader(f, delimiter=',')
    rel_data = [row for row in reader]

with open("segmentations/test-annotations-object-segmentation.csv") as f:
    reader = csv.reader(f, delimiter=',')
    seg_data = [row for row in reader]

# Find intersection of ImageIDs with all annotations
det_ids = {l[0] for l in det_data[1:]}
lab_ids = {l[0] for l in lab_data[1:]}
rel_ids = {l[0] for l in rel_data[1:]}

# We only downloaded the zip for files starting with "0"
seg_ids = {l[1] for l in seg_data[1:] if l[1][0] == "0"}

valid_ids = det_ids & lab_ids & rel_ids & seg_ids

We now have a list of valid_ids that contains all of the annotations we want to look at. Let’s choose a subset of 100 of those and download the corresponding images following what is done in the official Open Images download script.

pip install boto3
import boto3
import botocore
import os

BUCKET_NAME = 'open-images-dataset'

bucket = boto3.resource(
  's3', config=botocore.config.Config(
    signature_version=botocore.UNSIGNED)).Bucket(BUCKET_NAME)

num_ids = 100

os.makedirs('images/test', exist_ok=True)

for image_id in list(valid_ids)[:num_ids]:
  bucket.download_file(f'test/{image_id}.jpg',
    os.path.join('images/test/', f'{image_id}.jpg'))

The last thing we need is a mapping from the class and attribute IDs to their actual names.

import csv

with open("labels/class-descriptions-boxable.csv") as f:
    reader = csv.reader(f, delimiter=',')
    cls_data = [row for row in reader]

# Map of class IDs to class names
classes_map = {k: v for k,v in cls_data}


with open("relationships/oidv6-attributes-description.csv") as f:
    reader = csv.reader(f, delimiter=',')
    attrs_data = [row for row in reader]

# Map of attribute IDs to attribute names
attrs_map = {k: v for k,v in attrs_data}

Loading Custom Datasets into FiftyOne

You will first need to install FiftyOne through a simple pip command. It is recommended to work with FiftyOne in interactive Python sessions, so let’s install that too.

pip install fiftyone
pip install ipython

After launching ipython the first step is to create a FiftyOne Dataset.

import fiftyone as fo

dataset = fo.Dataset("open_images_v6")

If you want this dataset to exist after exiting the Python session, set the persistent attribute to True. This lets us quickly load the dataset in the future.

dataset.persistent = True

We then need to create FiftyOne Samples for each image that contain the file path to the images as well as all label information that we want to import. For each label type, we will create a corresponding object in FiftyOne and add it as a field to our samples.

Adding image-level classification labels will utilize the fo.Classifications class. Detections, segmentations, and relations can all use the fo.Detections class since it supports bounding boxes, masks, and also custom attributes assigned to each detection. These custom attributes can be used for things like IsOccluded in the detections or the two labels that a relationship is between.

The sections below outline how to create FiftyOne labels from the Open Images data we have loaded so far and then how to add them to your FiftyOne Dataset.

Classification Labels

Classification labels utilize the fo.Classification class. Since these are multi-label classifications, we will be using the fo.Classifications class to store multiple classification labels.

Additionally, we want to separate out the positive and negative labels (1 and 0 confidence respectively) into different classifications fields so we can view them separately in the App.

def create_labels(lab_data, image_id):
    pos_cls = []
    neg_cls = []
    # Get relevant data for this image
    sample_labs = [i for i in lab_data if i[0]==image_id]
    for sample_lab in sample_labs:
        # sample_lab reference: [ImageID,Source,LabelName,Confidence]
        label = classes_map[sample_lab[2]]
        conf = float(sample_lab[3])
        cls = fo.Classification(label=label, confidence=conf)

        if conf > 0.1:
            pos_cls.append(cls)
        else:
            neg_cls.append(cls)

    pos_labels = fo.Classifications(classifications=pos_cls)
    neg_labels = fo.Classifications(classifications=neg_cls)
    
    return pos_labels, neg_labels

Object Detections

Similar to classifications, the fo.Detections class lets you store multiple fo.Detection objects in a list. We create a detection by defining the bounding box coordinates and class label of the object. We can then add any additional attributes that we want, like IsOccluded and IsTruncated.

def create_detections(det_data, image_id):
    dets = []
    
    sample_dets = [i for i in det_data if i[0]==image_id]
    for sample_det in sample_dets:
        # sample_det reference: [ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside]
        label = classes_map[sample_det[2]]
        xmin = float(sample_det[4])
        xmax = float(sample_det[5])
        ymin = float(sample_det[6])
        ymax = float(sample_det[7])

        # Convert to [top-left-x, top-left-y, width, height]
        bbox = [xmin, ymin, xmax-xmin, ymax-ymin]

        detection = fo.Detection(bounding_box=bbox, label=label)

        detection["IsOccluded"] = bool(int(sample_det[8]))
        detection["IsTruncated"] = bool(int(sample_det[9]))
        detection["IsGroupOf"] = bool(int(sample_det[10]))
        detection["IsDepiction"] = bool(int(sample_det[11]))
        detection["IsInside"] = bool(int(sample_det[12]))

        dets.append(detection)
    
    detections = fo.Detections(detections=dets)

    return detections

Visual Relationships

Relationships are best represented in FiftyOne through fo.Detections since a relationship contains a bounding box, relationship label, and object labels, all of which can be stored in a detection. We are going to have the bounding box of the relationship encompass the bounding boxes of both objects it pertains to. We add the labels of each object as additional custom fields to the detection.

It should be noted, that you could easily also add the two objects that make up the relationship as individual detections, not doing so was just a design choice for this post.

def create_relationships(rel_data, image_id):
    rels = []
    sample_rels = [i for i in rel_data if i[0]==image_id]
    for sample_rel in sample_rels:
        # sample_rel reference: [ImageID,LabelName1,LabelName2,XMin1,XMax1,YMin1,YMax1,XMin2,XMax2,YMin2,YMax2,RelationshipLabel]
        label1 = classes_map[sample_rel[1]]
    
        attribute = False
        if sample_rel[2] in classes_map:
            label2 = classes_map[sample_rel[2]]
        else:
            label2 = attrs_map[sample_rel[2]]
            attribute = True
    
        label_rel = sample_rel[-1]
    
        xmin1 = float(sample_rel[3])
        xmax1 = float(sample_rel[4])
        ymin1 = float(sample_rel[5])
        ymax1 = float(sample_rel[6])
    
        xmin2 = float(sample_rel[7])
        xmax2 = float(sample_rel[8])
        ymin2 = float(sample_rel[9])
        ymax2 = float(sample_rel[10])
    
        xmin_int = min(xmin1, xmin2)
        ymin_int = min(ymin1, ymin2)
        xmax_int = max(xmax1, xmax2)
        ymax_int = max(ymax1, ymax2)
    
        # Convert to [top-left-x, top-left-y, width, height]
        bbox_int = [xmin_int, ymin_int, xmax_int-xmin_int, ymax_int-ymin_int]
    
        detection_rel = fo.Detection(bounding_box=bbox_int, label=label_rel)
    
        detection_rel["Label1"] = label1
        detection_rel["Label2"] = label2

        rels.append(detection_rel)

    relationships = fo.Detections(detections=rels)

    return relationships 

Segmentations

We can once again use fo.Detections to store segmentations since a detection contains an optional mask argument that accepts a NumPy array and will scale it to the bounding box region. The segmentations in Open Images also contain a bounding box around the mask as well as the instance label, all of which is added to the detection objects.

def create_segmentations(seg_data, image_id):
    segs = []
    sample_segs = [i for i in seg_data if i[1]==image_id]
    for sample_seg in sample_segs:
        # sample_seg reference: [MaskPath,ImageID,LabelName,BoxID,BoxXMin,BoxXMax,BoxYMin,BoxYMax,PredictedIoU,Clicks]
        label = classes_map[sample_seg[2]]
        xmin = float(sample_seg[4])
        xmax = float(sample_seg[5])
        ymin = float(sample_seg[6])
        ymax = float(sample_seg[7])

        # Convert to [top-left-x, top-left-y, width, height]
        bbox = [xmin, ymin, xmax-xmin, ymax-ymin]

        # Load boolean mask
        mask_path = os.path.join("segmentations/masks", sample_seg[0])
        mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) > 122
        h,w = mask.shape
        cropped_mask = mask[int(ymin*h):int(ymax*h), int(xmin*w):int(xmax*w)]

        segmentation = fo.Detection(bounding_box=bbox, label=label,
                mask=cropped_mask)

        segs.append(segmentation)

    segmentations = fo.Detections(detections=segs)

    return segmentations

Creating FiftyOne Samples

Now that we defined the functions to take in Open Images data and return FiftyOne labels, we can create samples and add these labels to them.

Samples only need a filepath to be instantiated and we can add any FiftyOne labels to a sample. Once the sample is created, we can add it to the dataset and continue until all of our data is loaded.

# Add Samples to Dataset
for image_id in list(valid_ids)[:num_ids]:
    sample = fo.Sample(filepath=os.path.join("images/test/%s.jpg" % image_id))

    # Add Labels
    pos_labels, neg_labels = create_labels(lab_data, image_id)
    sample["positive_labels"] = pos_labels 
    sample["negative_labels"] = neg_labels 

    # Add Detections
    detections = create_detections(det_data, image_id)
    sample["detections"] = detections 

    # Add Segmentations
    segmentations = create_segmentations(seg_data, image_id)
    sample["segmentations"] = segmentations

    # Add Relationships
    relationships = create_relationships(rel_data, image_id)
    sample["relationships"] = relationships

    dataset.add_sample(sample)

Visualizing and Exploring

Once we have our data loaded into a FiftyOne dataset, we can launch the App and start exploring.

session = fo.launch_app(dataset)

In the App, we can select which of the label fields that we want to view, look at individual samples in an expanded view, and also view the distributions of labels.

Opening a sample in the expanded view lets you visualize the attributes we added, like the labels of a relationship. For example, we can see that there are two Ride relationships between Man and Horse in the image below.

Being able to visualize our dataset easily lets us quickly spot check the data. For example, it appears that the Mammal label has an inconsistent meaning between different samples. Below are two images containing humans, one has Mammal as a negative_label and the other has Mammal as a positive_label.

Queries

One of the cutting-edge features that FiftyOne provides is the ability to interact closely with your dataset in code and in the App. This lets you write sophisticated queries that would otherwise require a large amount of scripting.

For example, say that we want build a subset of Open Images containing close up images of faces. We can create a view into the dataset that will let us get all detections that contain a Human face with a bounding box area greater than 0.2.

from fiftyone import ViewField as F

bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

large_face_view = (
    dataset
    .filter_labels(
        "detections", F("label")=="Human face", only_matches=True
    ).filter_labels(
        "detections", bbox_area > 0.2, only_matches=True
    )
)

session.view = large_face_view

Converting Formats

Once your data is in FiftyOne, you can export it in any of the formats that FiftyOne supports with just a couple of lines of code.

For example, if we want to export the detections that we added to MS-COCO format so that we can use the pycocotools evaluation on it, we can do so in one line of code in Python.

dataset.export(
    export_dir="/path/to/dir",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="detections",
)

The General Formula for Loading Datasets

The easiest way to load your data is if you follow a standard format for your annotations. For example, if you just finished annotating in the open-source tool, CVAT, you can load it into FiftyOne as easily as:

import fiftyone as fo

dataset = fo.Dataset.from_dir("path/to/data", dataset_type=fo.types.CVATImageDataset)

Even if your data is in a custom format, it’s easy to manually build a FiftyOne dataset.

import fiftyone as fo

# Initialize your dataset
dataset = fo.Dataset(name)

# Get a list of image paths
images = ....

# Parse your labels into samples
for path in images:
   sample = fo.Sample(path)
   sample["label"] = fo.Classification(label=...)   
  
   detections = []
   for det in parsed_detection_labels:
       detection = fo.Detection(label=..., bounding_box=[...])
       detections.append(detection)
        
   sample["detections"] = fo.Detections(detections=detections)
  
   dataset.add_sample(sample)
    
# Visualize your dataset in the FiftyOne App
session = fo.launch_app(dataset)

Once you’ve loaded your data, you can utilize the FiftyOne App to visualize and explore your dataset, use the FiftyOne Model Zoo to generate predictions on your data, export it in various formats, and more!