Welcome to the latest installment of our ongoing blog series where we highlight computer vision datasets in the FiftyOne Dataset Zoo! FiftyOne provides a Dataset Zoo that contains a collection of common datasets that you can download and load into FiftyOne via a few simple commands. In this post, we explore the UCF101 Action Recognition video dataset.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
The FiftyOne Dataset Zoo comprises more than 30 datasets, with new datasets being added all the time! They cover a variety of computer vision use cases including:
- Video
- Images
- Location
- Point-cloud
- Action-recognition
- Classification
- Detection
- Segmentation
- Relationships
- And more!
About the UCF101 action recognition dataset
UCF101 is a human action recognition (HAR) dataset of realistic action videos, collected from YouTube, with 101 action categories. At the time of its release in 2012, it was the largest video action recognition dataset available to the research community.
The dataset is made up of 13,320 videos (27 total hours) and is characterized by a large diversity of actions, as well as large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, and illumination conditions. Because at the time of its curation, most of the available action recognition datasets were not realistic or were staged by actors, UCF101 aimed to encourage further research into action recognition by learning and exploring new realistic action categories.
The videos in the 101 action categories are grouped into 25 groups, where each group can consist of 4-7 videos of an action. Videos from the same group can share some common features, such as a similar background, viewpoint, etc.
The action categories are divided into five types:
- Human-Object Interaction
- Body-Motion Only
- Human-Human Interaction
- Playing Musical Instruments
- Sports
Note: The UCF101 dataset is an extension of the UCF50 Action Recognition Data Set, which has 50 action categories.
What is human action recognition?
As you can imagine, it is very easy for a human to watch a video and recognize humans and the actions they are performing. However, having a machine do the same thing is a very challenging problem in the “video understanding” subfield of computer vision. More concretely, human action recognition for the purposes of this dataset is the problem of automatically assigning a video into one of the 101 different action categories.
Human action recognition in video has a variety of real-world applications like surveillance (military, industrial and civilian), healthcare (for example: the monitoring of patients as they move around a facility), human-computer interaction, content-based video retrieval, and video summarization.
Dataset quick facts
- Research Paper: UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild
- Authors: Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah
- Download Dataset: RAR file
- Revised Annotations: Available on thumos.info
- Action Recognition: ZIP file
- Action Detection: ZIP file
- Video-Level Annotations: ZIP file
- STIP Features: Part 1, Part 2
- Dataset Size: 6.48 GB
- Last Release: 2012
- FiftyOne Dataset Name:
ucf101
- Tags:
video
,action-recognition
- Supported Splits:
train
,test
- Zoo Dataset class:
UCF101Dataset
Step 1: Install FiftyOne
If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on macOS:
- Verify your version of Python
- Create and activate a virtual environment
- Install IPython (optional)
- Upgrade your
Setuptools
- Install FiftyOne
- Install FFmpeg
- Install a utility to uncompress .rar files (optional)
Note: In order to work with this video dataset you’ll need to have FFmpeg installed. Also, if you don’t have a package already installed to uncompress the UCF101 dataset .rar files, you can install a utility to accomplish this (for example on Mac) using:
brew install rar
You may also need to restart your IPython kernel and/or authorize the rar app in your macOS settings for the utility to be recognized during the dataset import step.
Or on Linux:
sudo apt install rar
Learn more about how to get up and running with FiftyOne in the Docs.
Step 2: Import the dataset
Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset into FiftyOne and launch the FiftyOne App. This should take just a few minutes and a few more lines of code.
import fiftyone as fo import fiftyone.zoo as foz import fiftyone.utils.video as fouv dataset = foz.load_zoo_dataset("ucf101", split="test") # Re-encode source videos as H.264 MP4s so they can be viewed in the App fouv.reencode_videos(dataset) print(dataset.name) # ucf101-test session = fo.launch_app(dataset)
The last line in the code snippet will launch the FiftyOne App in your default browser. You should see the following initial view of the test
dataset in the FiftyOne App:
Your directory of videos in ~/fiftyone/ucf101
should have a test
and train
folder, plus an info.json
file:
Both the test
and train
folders will contain video files broken up into 101 action categories.
Tip: If you want to persist the dataset so you don’t have to repeat the re-encode process when you load it in your next session, add the following to your initial load command:
dataset.persistent = True
Now, you can load the dataset quickly and launch the App in your next session.
import fiftyone as fo dataset = fo.load_dataset("ucf101-test") session = fo.launch_app(dataset)
Ok, let’s do a quick exploration of the UCF101 dataset!
Sample details
Click on any of the samples to get additional detail like tags, metadata, labels, frame labels and primitives.
Filtering by ID
FiftyOne makes it very easy to filter the samples to find the ones that meet your specific criteria. For example we can filter by a specific id
:
Filtering by label
In this example we filter the samples by the SkateBoarding
action category.
Start working with the dataset
Now that you have a general idea of what the dataset contains, you can start using FiftyOne to perform a variety tasks including:
- Creating dataset views
- Creating aggregations
- Creating interactive plots
- Annotating datasets
- Evaluating models
You can also start making use of the FiftyOne Brain which provides powerful machine learning techniques you can apply to your workflows like visualizing embeddings, finding similarity, uniqueness and mistakenness.