Skip to content

Exploring Google’s Open Images V7

Open Images samples with object detection, instance segmentation, and classification labels loaded into the FiftyOne App. Image courtesy of Open Images.

Google’s Open Images dataset just got a major upgrade. The dataset that gave us more than one million images with detection, segmentation, classification, and visual relationship annotations has added 22.6 million point labels spanning 4171 classes.

With Open Images V7, Google researchers make a move towards a new paradigm for semantic segmentation: rather than densely labeling every pixel in an image, which leads to expensive and time consuming annotation, in the accompanying paper they show that sparse annotations of a variety they dub pointillism can lead to comparable model performance. This addition to Open Images reflects the ongoing shift in computer vision towards data-centric approaches.

In this article, we’ll show you how to get started working with Open Images V7 and point labels, and explore some features of the dataset. For this exploration, we’ll be using the open source computer vision library FiftyOne, which is one of the official download and visualization tools recommended by the Open Images team.

For a deep-dive into Open Images V6, check out this Medium article and tutorial. Keep reading for a look at point labels and how to navigate what’s new in Open Images V7!

Loading in the data

The easiest way to get started is to import FiftyOne and download Open Images V7 from the FiftyOne Dataset Zoo.

## install if you haven't already
!pip install fiftyone

import fiftyone as fo
import fiftyone.zoo as foz

## load dataset
dataset = foz.load_zoo_dataset("open-images-v7")

By default, this will download (if necessary) all splits of the data — train, test, and validation — including all available label types for each, and the associated metadata.

As with the Open Images V6 dataset in the FiftyOne Dataset Zoo, however, we can also specify what subsets of the data we would like to download and load! In this article, we’ll be working with the validation split, which consists of 41,620 images. We can download and load in this split in FiftyOne by passing in the split argument:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "open-images-v7",
    split="validation",
)

If we only wanted to download a thousand images, for the purposes of getting a feel for the data, we could specify this with the max_samples argument:

dataset = foz.load_zoo_dataset(
    "open-images-v7",
    split="validation",
    max_samples=1000,
)

Let’s take a look at this data by launching the FiftyOne App, a powerful graphical user interface that enables you to visualize, browse, and interact directly with your datasets:

session = fo.launch_app(dataset)
1000 images from the Open Images V7 validation split visualized with labels in the FiftyOne App. Image by author.

As we can see, there’s a lot going on here. Every one of these images has object bounding boxes, segmentation masks, classification labels, relationship labels, and point labels. However, not every image in the Open Images dataset is annotated with every one of these label types; some images, for instance, do not have point labels. The reason the data that we’ve loaded has all of these is that, by default, FiftyOne prioritizes downloading images that have as many label types as possible!

If we only care about some of the types of annotations, we can isolate these in one of a few ways. We can use the sidebar on the left hand side to toggle label types on/off:

Images from the Open Images V7 validation split with only detection and point labels shown in the FiftyOne App. Image by author.

Or we can create a view of the data using select_labels() to choose the label types we want, and then view this in the FiftyOne App:

# create a view containing only point labels
points_view = dataset.select_labels(fields=”points”)
session.view = points_view.view()

Another alternative is to explicitly specify which label types we want when loading the dataset:

# only load point labels and detection labels
dataset = foz.load_zoo_dataset(
    "open-images-v7",
    split="validation",
    max_samples=1000,
    label_types=["points", "detections"],
)

What’s the point?

Now that we’ve loaded in the data, let’s get a better sense for what these point labels are, and what they actually mean. In the FiftyOne App, click on one of the images in the sample grid and hover over one of the points in the image. You’ll see a box that lists properties of that point label. In a moment, we’ll go through these one by one, but first, a crucial point (no pun intended) must be made.

Close-up of a single image from Open Images V7, including the contents of one of the “point labels”. Image by author.

In generating this dataset, the creators set about asking yes/no questions about whether a given point corresponded to a given class label. This means that these point labels are not labels for the classes they specify. Rather, the point labels are the class labels about which the yes/no questions were concerned.

If you are familiar with previous versions of the Open Images dataset, this notion may be familiar to you, as the classification labels follow a similar pattern, with negative labels and positive labels. Point labels, however, are slightly more nuanced than classification labels.

First off, different points received different numbers of total votes. Second, annotators were allowed to cast votes as “yes”, “no”, or “unsure” votes for the same class label. As a result, some point labels are estimated as unsure!

One last difference is that these point labels were generated via two different methods. Some of the yes/no questions were answered by human annotators, while others were “answered” by a model which predicted a candidate class.

Properties

  • label: the things/stuff class about which the yes/no question was asked
  • yes_votes: the number of “yes” votes cast by annotators
  • no_votes: the number of “no” votes cast by annotators
  • unsure_votes: the number of “unsure” votes cast by annotators
  • estimated_yes_no: the best guess for whether the class label fits the point, given the votes cast
  • source: the method used to evaluate (cast votes for) the class label, with “ih” for human annotators and “cc” for model-predicted candidate class
  • id: unique identifier within FiftyOne

With the exception of the id, all of these properties are extracted directly from the Open Images raw data. For more details on these properties, see the Open Images V7 paper.

In FiftyOne, these point labels are represented by Keypoint Labels, which enable us to easily access, filter, and perform operations on these labels.

For an individual sample, it is easy to read out this data:

sample = dataset.first()
print(sample.points.keypoints)
[<Keypoint: {
    'id': '63dd4da3c5d1bb17274b573e',
    'attributes': {},
    'tags': [],
    'label': 'Rope',
    'points': [[0.11230469, 0.7114094]],
    'confidence': None,
    'index': None,
    'estimated_yes_no': 'no',
    'source': 'ih',
    'yes_votes': 0,
    'no_votes': 3,
    'unsure_votes': 0,
}>, <Keypoint: {
    'id': '63dd4da3c5d1bb17274b573f',
    'attributes': {},
    'tags': [],
    'label': 'Rope',
    'points': [[0.49609375, 0.31767338]],
    'confidence': None,
    'index': None,
    'estimated_yes_no': 'no',
    'source': 'ih',
    'yes_votes': 0,
    'no_votes': 3,
    'unsure_votes': 0,
    ........
}>, <Keypoint: {
    'id': '63dd4da3c5d1bb17274b5751',
    'attributes': {},
    'tags': [],
    'label': 'Fixed-wing aircraft',
    'points': [[0.79840761, 0.73948359]],
    'confidence': None,
    'index': None,
    'estimated_yes_no': 'yes',
    'source': 'cc',
    'yes_votes': 1,
    'no_votes': 0,
    'unsure_votes': 0,
}>, <Keypoint: {
    'id': '63dd4da3c5d1bb17274b5752',
    'attributes': {},
    'tags': [],
    'label': 'Fixed-wing aircraft',
    'points': [[0.8213462, 0.51104984]],
    'confidence': None,
    'index': None,
    'estimated_yes_no': 'yes',
    'source': 'cc',
    'yes_votes': 1,
    'no_votes': 0,
    'unsure_votes': 0,
}>]

We can also use FiftyOne’s Aggregation class and filtering capabilities to quickly get some summary statistics about the dataset (the validation split that we downloaded in the first section).

We can compute the total number of “yes”, “no” and “unsure” votes across all points:

total_yes_votes = dataset.sum("points.keypoints.yes_votes")
total_no_votes = dataset.sum("points.keypoints.no_votes")
total_unsure_votes = dataset.sum("points.keypoints.unsure_votes")

print(f"total YES votes: {total_yes_votes}")
print(f"total NO votes: {total_no_votes}")
print(f"total UNSURE votes: {total_unsure_votes}")
total YES votes: 1055849
total NO votes: 1700581
total UNSURE votes: 114893

And the number of times a point had a certain number of votes for these three options:

yes_counts = dataset.count_values("points.keypoints.yes_votes")
no_counts = dataset.count_values("points.keypoints.no_votes")
unsure_counts = dataset.count_values("points.keypoints.unsure_votes")

print(f"YES votes counts: {yes_counts}")
print(f"NO votes counts: {no_counts}")
print(f"UNSURE votes counts: {unsure_counts}")
YES vote counts: {0: 804909, 1: 450783, 2: 78768, 4: 132, 3: 148409, 5: 151, 6: 170}
NO vote counts: {4: 113, 1: 376188, 3: 394390, 0: 642460, 6: 75, 5: 43, 2: 70053}
UNSURE vote counts: {1: 114855, 2: 19, 0: 1368448}

Using FiftyOne’s ViewField with from fiftyone import ViewField as F, we can also do things like count the total number of point labels:

dataset.sum(F("points.keypoints").length())
## 1483322

And efficiently compute the number of point labels that received at least one “yes” vote and at least one “no” vote:

dataset.filter_labels(
    "points",
    (F("yes_votes")!=0) & (F("no_votes")!=0)
).sum(F("point.keypoints").length())
## 147223

Point(s) of interest

Now that we understand what these point labels are and how to access their basic properties, let’s discuss how to massage the data into a form that is potentially more useful for downstream processing.

One thing we might want to do is extract a subset of the dataset with point labels for particular classes. As an example, suppose we are working on a wildlife conservation project and we want to train a model to identify turtles and tortoises. In FiftyOne there are multiple ways to accomplish this!

If we already have the entire dataset loaded, we can filter the point labels for instances of “Turtle” or “Tortoise”, and either use select_labels() to get only the point labels and perhaps the positive classification labels, or we can toggle the rest of the labels off in the FiftyOne App, as we’ve done here:

turtle_view = dataset.filter_labels(
    "points",
    F("label").is_in(
        ["Turtle", "Tortoise"]
    )
)
session.view = turtle_view.view()
View of point labels and positive classification labels for images of turtles and tortoises in the Open Images V7 validation split. Image by author.

Alternatively, if you know from the start that you are only interested in certain label classes, you can pass this information directly into the load_zoo_dataset() method:

turtle_dataset = foz.load_zoo_dataset(
    "open-images-v7",
    split="validation",
    label_types=["points", "classifications"],
    classes=["Turtle", "Tortoise"],
    dataset_name="oi-turtle",
)

Another thing we might want to do is turn the raw “yes”, “no”, and “unsure” votes that were cast for these point labels into something more concrete that we can use to train models. For simplicity’s sake, let’s say that rather than use the estimated_yes_no values given by the dataset’s authors, we want to generate “positive” and “negative” point labels in direct analogy with the classification labels.

As an example workflow, we could imagine that we are only interested in point label votes cast by human annotators — not the candidate classes generated by models. And let’s further suppose that to ensure with high probability that our ground truth labels are accurate, we will only label points as “positive” or “negative” if there are at least two human votes cast, and all votes agree. Here’s one way to do this in FiftyOne:

First, we will filter the point labels for points with multiple positive votes, no negative or unsure votes, and source="ih", and clone this view into a new dataset positive_dataset, only keeping the points field:

positive_dataset = dataset.filter_labels(
    "points",
    (F("source") == "ih") &
    (F("yes_votes") > 1) & 
    (F("no_votes") == 0) & 
    (F("unsure_votes") == 0)
).select_fields(
    "points"
).clone()

Then rename the embedded field points to positive_points, so that we can merge this into the original dataset shortly.

positive_dataset.rename_sample_field(
    "points", 
    "positive_points"
)

We can then get rid of a bunch of the embedded fields within the Keypoint label, because we already know the source and number of “no” and “unsure” votes. For similar reasons, we can also rename the yes_votes field to votes:

embedded_fields = ["source", "estimated_yes_no", "no_votes", "unsure_votes"]
embedded_fields = ["positive_points.keypoints." + ef for ef in embedded_fields]

for field in embedded_fields:
    positive_dataset.delete_sample_field(field)

positive_dataset.rename_sample_field(
    "positive_points.keypoints.yes_votes", 
    "positive_points.keypoints.votes"
)

Finally, we can merge this positive_dataset into the original dataset:

dataset.merge_samples(positive_dataset)

After going through an analogous procedure for the negative point labels, we can use select_fields one more time to create a view containing only the positive and negative point labels, and the positive and negative classification labels:

Visualization of high confidence positive and negative point labels for Open Images V7 validation images. Image by author.

Conclusion

In this article, we’ve only scratched the surface of what you can do with FiftyOne and Google’s latest version of the Open Images dataset. If you want to dive deeper into features of the Open Images dataset, check out this Medium article and tutorial, and if you want to know more about the pointillism-based approach to semantic segmentation, I encourage you to check out Google’s paper on the topic.

I hope this article showed you how easy it is to get started working with Open Images V7 using FiftyOne!