Exploring the Cityscapes Dataset for Semantic Urban Scene Understanding

Jimmy Guerrero

March 14, 2023

Welcome to the latest installment of our ongoing blog series where we highlight datasets from the FiftyOne Dataset Zoo! FiftyOne provides a Dataset Zoo that contains a collection of common datasets that you can download and load into FiftyOne via a few simple commands. In this post, we explore the Cityscapes dataset.

Wait, What’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

The FiftyOne Dataset Zoo comprises more than 30 datasets, with new datasets being added all the time! They cover a variety of computer vision use cases including:

Video
Images
Location
Point-cloud
Action-recognition
Classification
Detection
Segmentation
Relationships
And more!

*Sample of annotations in the Cityscapes dataset. Source: https://www.cityscapes-dataset.com/examples/*

About the Cityscapes Dataset

The Cityscapes Dataset is a large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5,000 frames in addition to a larger set of 20 000 weakly annotated frames. At the time of its release it was an order of magnitude larger than similar previous attempts.

Its primary use case is for assessing the performance of vision algorithms for major tasks of semantic urban scene understanding: pixel-level, instance-level, and panoptic semantic labeling; supporting research that aims to exploit large volumes of (weakly) annotated data, e.g. for training deep neural networks.

What Is Visual Scene Understanding?

Scene understanding is the process of perceiving, analyzing and elaborating an interpretation of a 3D dynamic scene observed through a network of sensors. This usually involves matching signal information coming from the sensors observing the scene, with machine learning models humans are using to understand the scene. As a result, scene understanding both adds and extracts semantic information from the sensor data characterizing a scene. The type of sensors usually involved in visual scene understanding are cameras. But, you may also have scenarios where additional data is being captured by microphones, radar or other sensors. Object-wise, the scene can contain a variety of physical objects of various types (for example cars and people) interacting with each other or with their environment. The scene itself can be just a few seconds long or a multi-day time lapse. It can also be confined to a microscopic view or involve an entire cityscape.

Design Choices

Here’s an overview of the design choices that were made in regards to the dataset’s focus.

Features

Polygonal annotations	Dense semantic segmentation Instance segmentation for vehicle and people
Complexity	30 classes See Class Definitions for a list of all classes and have a look at the applied labeling policy.
Diversity	50 cities Several months (Spring, Summer, Fall) Daytime Good/medium weather conditions Manually selected frames Large number of dynamic objects Varying scene layout Varying background
Volume	5 000 annotated images with fine annotations (examples) 20 000 annotated images with coarse annotations (examples)
Metadata	Preceding and trailing video frames. Each annotated image is the 20th image from a 30 frame video snippets (1.8s) Corresponding right stereo views GPS coordinates Ego-motion data from vehicle odometry Outside temperature from vehicle sensor
Extensions by other researchers	Bounding box annotations of people Images augmented with fog and rain
Benchmark suite and evaluation server	Pixel-level semantic labeling Instance-level semantic labeling Panoptic semantic labeling

Labeling Policy

Labeled foreground objects must never have holes. For example if there is some background visible ‘through’ some foreground object, it is considered to be part of the foreground. This also applies to regions that are highly mixed with two or more classes: they are labeled with the foreground class. Some examples would include:

tree leaves in front of house or sky (everything tree)
transparent car windows (everything car)

Class Definitions

Group	Classes
flat	road sidewalk parking+ rail track+
human	person* rider*
vehicle	car* truck* bus* on rails* motorcycle* bicycle* caravan* trailer*+
construction	building wall fence guard rail+ bridge+ tunnel+
object	pole pole group+ traffic sign traffic light
nature	vegetation terrain
sky	sky
void	ground+ dynamic+ static+

* Single instance annotations are available. However, if the boundary between such instances cannot be clearly seen, the whole crowd/group is labeled together and annotated as group, e.g. car group.

+ This label is not included in any evaluation and treated as void (or in the case of license plate as the vehicle mounted on).

Dataset Quick Facts

Research Paper: The Cityscapes Dataset for Semantic Urban Scene Understanding
Authors: M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele
Download Dataset: Register and download
License: Free, but registration is required
Dataset Size: 11.8 GB
Last Release: 2016
FiftyOne Dataset Name: cityscapes
Tags: image, multilabel, automotive, manual
Supported Splits: train, validation, test
Zoo Dataset class: CityscapesDataset

Step 1: Download the Dataset

In order to load the Cityscape Dataset into FiftyOne, you must download the source data manually with your source_dir organized in the following manner:

Note that the gtFine_trainvaltest, gtCoarse, and gtBbox_cityPersons_trainval are optional directories.

Step 2: Install FiftyOne

If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on macOS:

Verify your version of Python
Create and activate a virtual environment
Install IPython (optional)
Upgrade your Setuptools
Install FiftyOne

Learn more about how to get up and running with FiftyOne in the Docs.

Step 3: Import the Dataset

Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset into FiftyOne and launch the FiftyOne App. This should take just a few minutes and a few more lines of code.

import fiftyone as fo
import fiftyone.zoo as foz

# The path to the source files that you manually downloaded
source_dir = "/path/to/dir-with-cityscapes-files"

dataset = foz.load_zoo_dataset(
    "cityscapes",
    split="validation",
    source_dir=source_dir,
)

session = fo.launch_app(dataset)

The last line in the code snippet will launch the FiftyOne App in your default browser. You should see the following initial view of the cityscapes-validation dataset in the FiftyOne App:

*Default view of the Cityscapes Dataset in the FiftyOne App*

Tip: If you want to persist the dataset, add the following to your initial load command:

dataset.persistent = True

Ok, let’s do a quick exploration of the Cityscape Dataset!

Sample Details

Click on any of the samples to get additional detail like tags, metadata, labels, and primitives.

Details of a Cityscapes Dataset sample in the FiftyOne App

Filtering by ID

FiftyOne makes it very easy to filter the samples to find the ones that meet your specific criteria. For example we can filter by a specific id:

*Filtering the Cityscapes Dataset samples by id in the FiftyOne App*

Filtering by Label

In this example we filter the samples by the gt_person label selecting only those with pedestrian:

*Filtered view of samples with the gt_person pedestrian label*

*Detailed view of a sample with the gt_person pedestrian label*

In this example we filter the samples by the gt_coarse label:

*Filtered view of samples with the gt_coarse label*

Start Working with the Cityscapes Dataset

Now that you have a general idea of what the dataset contains, you can start using FiftyOne to perform a variety tasks including:

You can also start making use of the FiftyOne Brain which provides powerful machine learning techniques you can apply to your workflows like visualizing embeddings, finding similarity, uniqueness and mistakenness.

Cityscapes, Dataset Zoo, semantic urban scene understanding

Have a team? Talk to an expert.

Let us know how to reach you and we’ll be in touch.

Exploring the Cityscapes Dataset for Semantic Urban Scene Understanding

Part of a team?
Talk to a computer vision expert.

Wait, What’s FiftyOne?

About the Cityscapes Dataset

What Is Visual Scene Understanding?

Design Choices

Features

Labeling Policy

Class Definitions

Dataset Quick Facts

Step 1: Download the Dataset

Step 2: Install FiftyOne

Step 3: Import the Dataset

Sample Details

Filtering by ID

Filtering by Label

Start Working with the Cityscapes Dataset

Have a team? Talk to an expert.

If you like what you see on GitHub, give the Open Source FiftyOne project a star

Ready to get started? We’ve made it easy to get up and running in just a few minutes!

Join the FiftyOne Slack community, we’re always happy to help

Exploring the Cityscapes Dataset for Semantic Urban Scene Understanding

Part of a team? Talk to a computer vision expert.

Wait, What’s FiftyOne?

About the Cityscapes Dataset

What Is Visual Scene Understanding?

Design Choices

Features

Labeling Policy

Class Definitions

Dataset Quick Facts

Step 1: Download the Dataset

Step 2: Install FiftyOne

Step 3: Import the Dataset

Sample Details

Filtering by ID

Filtering by Label

Start Working with the Cityscapes Dataset

Have a team? Talk to an expert.

If you like what you see on GitHub, give the Open Source FiftyOne project a star

Ready to get started? We’ve made it easy to get up and running in just a few minutes!

Join the FiftyOne Slack community, we’re always happy to help

Part of a team?
Talk to a computer vision expert.

Ready to get started? We’ve made it easy to get up and running in just a few minutes!

Join the FiftyOne Slack community, we’re always happy to help