Skip to content

Exploring the UCF101 Dataset: A Large-Scale, YouTube-Based Action Recognition Dataset

how to explore UCF101 Action Recognition dataset of videos with FiftyOne

Welcome to the latest installment of our ongoing blog series where we highlight computer vision datasets in the FiftyOne Dataset Zoo! FiftyOne provides a Dataset Zoo that contains a collection of common datasets that you can download and load into FiftyOne via a few simple commands. In this post, we explore the UCF101 Action Recognition video dataset.

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

FiftyOne quick overview gif

The FiftyOne Dataset Zoo comprises more than 30 datasets, with new datasets being added all the time! They cover a variety of computer vision use cases including:

  • Video
  • Images
  • Location
  • Point-cloud
  • Action-recognition
  • Classification
  • Detection
  • Segmentation
  • Relationships
  • And more!
Sample of the UCF101 dataset. Source: https://www.crcv.ucf.edu/data/UCF101.php

About the UCF101 action recognition dataset

UCF101 is a human action recognition (HAR) dataset of realistic action videos, collected from YouTube, with 101 action categories. At the time of its release in 2012, it was the largest video action recognition dataset available to the research community.

The dataset is made up of 13,320 videos (27 total hours) and is characterized by a large diversity of actions, as well as large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, and illumination conditions. Because at the time of its curation, most of the available action recognition datasets were not realistic or were staged by actors, UCF101 aimed to encourage further research into action recognition by learning and exploring new realistic action categories.

The videos in the 101 action categories are grouped into 25 groups, where each group can consist of 4-7 videos of an action. Videos from the same group can share some common features, such as a similar background, viewpoint, etc.

The action categories are divided into five types:

  • Human-Object Interaction
  • Body-Motion Only
  • Human-Human Interaction
  • Playing Musical Instruments
  • Sports

Note: The UCF101 dataset is an extension of the UCF50 Action Recognition Data Set, which has 50 action categories.

Video tutorial: How to get started with the UCF101 action recognition video dataset

What is human action recognition?

As you can imagine, it is very easy for a human to watch a video and recognize humans and the actions they are performing. However, having a machine do the same thing is a very challenging problem in the “video understanding” subfield of computer vision. More concretely, human action recognition for the purposes of this dataset is the problem of automatically assigning a video into one of the 101 different action categories.

Human action recognition in video has a variety of real-world applications like surveillance (military, industrial and civilian), healthcare (for example: the monitoring of patients as they move around a facility), human-computer interaction, content-based video retrieval, and video summarization. 

Dataset quick facts

Step 1: Install FiftyOne

If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on macOS:

Note: In order to work with this video dataset you’ll need to have FFmpeg installed. Also, if you don’t have a package already installed to uncompress the UCF101 dataset .rar files, you can install a utility to accomplish this (for example on Mac) using:

brew install rar

You may also need to restart your IPython kernel and/or authorize the rar app in your macOS settings for the utility to be recognized during the dataset import step.

Or on Linux:

sudo apt install rar

Learn more about how to get up and running with FiftyOne in the Docs.

Step 2: Import the dataset

Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset into FiftyOne and launch the FiftyOne App. This should take just a few minutes and a few more lines of code.

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.video as fouv

dataset = foz.load_zoo_dataset("ucf101", split="test")

# Re-encode source videos as H.264 MP4s so they can be viewed in the App
fouv.reencode_videos(dataset)

print(dataset.name)  # ucf101-test

session = fo.launch_app(dataset)

The last line in the code snippet will launch the FiftyOne App in your default browser. You should see the following initial view of the test dataset in the FiftyOne App:

Initial view of UCF101 dataset in the FiftyOne App

Your directory of videos in ~/fiftyone/ucf101 should have a test and train folder, plus an info.json file:

Both the test and train folders will contain video files broken up into 101 action categories.

Tip: If you want to persist the dataset so you don’t have to repeat the re-encode process when you load it in your next session, add the following to your initial load command:

dataset.persistent = True

Now, you can load the dataset quickly and launch the App in your next session.

import fiftyone as fo

dataset = fo.load_dataset("ucf101-test")
session = fo.launch_app(dataset)

Ok, let’s do a quick exploration of the UCF101 dataset!

Sample details

Click on any of the samples to get additional detail like tags, metadata, labels, frame labels and primitives.

Details of a UCF101 dataset sample in the FiftyOne App

Filtering by ID

FiftyOne makes it very easy to filter the samples to find the ones that meet your specific criteria. For example we can filter by a specific id:

Filtering UCF101 dataset samples by ID in the FiftyOne App

Filtering by label

In this example we filter the samples by the SkateBoarding action category.

Filtered view of samples in the SkateBoarding action category of the UCF101 dataset

Start working with the dataset

Now that you have a general idea of what the dataset contains, you can start using FiftyOne to perform a variety tasks including:

You can also start making use of the FiftyOne Brain which provides powerful machine learning techniques you can apply to your workflows like visualizing embeddings, finding similarity, uniqueness and mistakenness.