Welcome to the latest installment of our ongoing blog series where we highlight computer vision datasets in the FiftyOne Dataset Zoo! FiftyOne provides a Dataset Zoo that contains a collection of common datasets that you can download and load into FiftyOne via a few simple commands. In this post, we explore the UCF101 Action Recognition video dataset.

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

The FiftyOne Dataset Zoo comprises more than 30 datasets, with new datasets being added all the time! They cover a variety of computer vision use cases including:

Video
Images
Location
Point-cloud
Action-recognition
Classification
Detection
Segmentation
Relationships
And more!

About the UCF101 action recognition dataset

UCF101 is a human action recognition (HAR) dataset of realistic action videos, collected from YouTube, with 101 action categories. At the time of its release in 2012, it was the largest video action recognition dataset available to the research community.

The dataset is made up of 13,320 videos (27 total hours) and is characterized by a large diversity of actions, as well as large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, and illumination conditions. Because at the time of its curation, most of the available action recognition datasets were not realistic or were staged by actors, UCF101 aimed to encourage further research into action recognition by learning and exploring new realistic action categories.

The videos in the 101 action categories are grouped into 25 groups, where each group can consist of 4-7 videos of an action. Videos from the same group can share some common features, such as a similar background, viewpoint, etc.

The action categories are divided into five types:

Human-Object Interaction
Body-Motion Only
Human-Human Interaction
Playing Musical Instruments
Sports

Note: The UCF101 dataset is an extension of the UCF50 Action Recognition Data Set, which has 50 action categories.

https://www.youtube.com/watch?v=xArphgd_hVs Video tutorial: How to get started with the UCF101 action recognition video dataset

What is human action recognition?

As you can imagine, it is very easy for a human to watch a video and recognize humans and the actions they are performing. However, having a machine do the same thing is a very challenging problem in the “video understanding” subfield of computer vision. More concretely, human action recognition for the purposes of this dataset is the problem of automatically assigning a video into one of the 101 different action categories.

Human action recognition in video has a variety of real-world applications like surveillance (military, industrial and civilian), healthcare (for example: the monitoring of patients as they move around a facility), human-computer interaction, content-based video retrieval, and video summarization.

Dataset quick facts

Research Paper: UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild
Authors: Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah
Download Dataset: RAR file
Revised Annotations: Available on thumos.info
Action Recognition: ZIP file
Action Detection: ZIP file
Video-Level Annotations: ZIP file
STIP Features: Part 1, Part 2
Dataset Size: 6.48 GB
Last Release: 2012
FiftyOne Dataset Name: ucf101
Tags: video, action-recognition
Supported Splits: train, test
Zoo Dataset class: UCF101Dataset

Step 1: Install FiftyOne

If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on macOS:

Verify your version of Python
Create and activate a virtual environment
Install IPython (optional)
Upgrade your Setuptools
Install FiftyOne
Install FFmpeg
Install a utility to uncompress .rar files (optional)

Note: In order to work with this video dataset you’ll need to have FFmpeg installed. Also, if you don’t have a package already installed to uncompress the UCF101 dataset .rar files, you can install a utility to accomplish this (for example on Mac) using:

You may also need to restart your IPython kernel and/or authorize the rar app in your macOS settings for the utility to be recognized during the dataset import step.

Or on Linux:

Learn more about how to get up and running with FiftyOne in the Docs.

Step 2: Import the dataset

Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset into FiftyOne and launch the FiftyOne App. This should take just a few minutes and a few more lines of code.

The last line in the code snippet will launch the FiftyOne App in your default browser. You should see the following initial view of the test dataset in the FiftyOne App:

Your directory of videos in ~/fiftyone/ucf101 should have a test and train folder, plus an info.json file:

Both the test and train folders will contain video files broken up into 101 action categories.

Tip: If you want to persist the dataset so you don’t have to repeat the re-encode process when you load it in your next session, add the following to your initial load command:

Now, you can load the dataset quickly and launch the App in your next session.

Ok, let’s do a quick exploration of the UCF101 dataset!

Sample details

Click on any of the samples to get additional detail like tags, metadata, labels, frame labels and primitives.

Filtering by ID

FiftyOne makes it very easy to filter the samples to find the ones that meet your specific criteria. For example we can filter by a specific id:

Filtering by label

In this example we filter the samples by the SkateBoarding action category.

Start working with the dataset

Now that you have a general idea of what the dataset contains, you can start using FiftyOne to perform a variety tasks including:

You can also start making use of the FiftyOne Brain which provides powerful machine learning techniques you can apply to your workflows like visualizing embeddings, finding similarity, uniqueness and mistakenness.

Talk to a computer vision expert