[@portabletext/react] Unknown block type "externalImage", specify a component for it in the `components.types` prop
Welcome to the latest installment of our ongoing blog series where we highlight computer vision datasets in the
FiftyOne Dataset Zoo! FiftyOne provides a Dataset Zoo that contains a collection of common datasets that you can download and load into FiftyOne via a few simple commands. In this post, we explore the
UCF101 Action Recognition video dataset.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
[@portabletext/react] Unknown block type "externalImage", specify a component for it in the `components.types` prop
The FiftyOne Dataset Zoo comprises more than 30 datasets, with new datasets being added all the time! They cover a variety of computer vision use cases including:
- Video
- Images
- Location
- Point-cloud
- Action-recognition
- Classification
- Detection
- Segmentation
- Relationships
- And more!
About the UCF101 action recognition dataset
UCF101 is a human action recognition (HAR) dataset of realistic action videos, collected from YouTube, with 101 action categories. At the time of its release in 2012, it was the largest video action recognition dataset available to the research community.
The dataset is made up of 13,320 videos (27 total hours) and is characterized by a large diversity of actions, as well as large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, and illumination conditions. Because at the time of its curation, most of the available action recognition datasets were not realistic or were staged by actors, UCF101 aimed to encourage further research into action recognition by learning and exploring new realistic action categories.
The videos in the 101 action categories are grouped into 25 groups, where each group can consist of 4-7 videos of an action. Videos from the same group can share some common features, such as a similar background, viewpoint, etc.
The action categories are divided into five types:
- Human-Object Interaction
- Body-Motion Only
- Human-Human Interaction
- Playing Musical Instruments
- Sports
https://www.youtube.com/watch?v=xArphgd_hVs Video tutorial: How to get started with the UCF101 action recognition video dataset
What is human action recognition?
As you can imagine, it is very easy for a human to watch a video and recognize humans and the actions they are performing. However, having a machine do the same thing is a very challenging problem in the “video understanding” subfield of computer vision. More concretely, human action recognition for the purposes of this dataset is the problem of automatically assigning a video into one of the 101 different action categories.
Human action recognition in video has a variety of real-world applications like surveillance (military, industrial and civilian), healthcare (for example: the monitoring of patients as they move around a facility), human-computer interaction, content-based video retrieval, and video summarization.
Dataset quick facts
Step 1: Install FiftyOne
If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on macOS:
Note: In order to work with this video dataset you’ll need to have
FFmpeg installed. Also, if you don’t have a package already installed to uncompress the UCF101 dataset .rar files, you can install a utility to accomplish this (for example on Mac) using:
You may also need to restart your IPython kernel and/or authorize the rar app in your macOS settings for the utility to be recognized during the dataset import step.
Or on Linux:
Step 2: Import the dataset
Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset into FiftyOne and launch the
FiftyOne App. This should take just a few minutes and a few more lines of code.
The last line in the code snippet will launch the FiftyOne App in your default browser. You should see the following initial view of the test
dataset in the FiftyOne App:
Your directory of videos in ~/fiftyone/ucf101
should have a test
and train
folder, plus an info.json
file:
Both the test
and train
folders will contain video files broken up into 101 action categories.
Tip: If you want to persist the dataset so you don’t have to repeat the re-encode process when you load it in your next session, add the following to your initial load command:
Now, you can load the dataset quickly and launch the App in your next session.
Ok, let’s do a quick exploration of the UCF101 dataset!
Sample details
Click on any of the samples to get additional detail like tags, metadata, labels, frame labels and primitives.
Filtering by ID
FiftyOne makes it very easy to filter the samples to find the ones that meet your specific criteria. For example we can filter by a specific id
:
Filtering by label
In this example we filter the samples by the SkateBoarding
action category.
Start working with the dataset
Now that you have a general idea of what the dataset contains, you can start using FiftyOne to perform a variety tasks including:
You can also start making use of the
FiftyOne Brain which provides powerful machine learning techniques you can apply to your workflows like visualizing embeddings, finding similarity, uniqueness and mistakenness.