This blog is part of the series “Journey into Visual AI: Exploring FiftyOne Together,” in which I want to bring my experience using FiftyOne in multiple stages. Don’t miss the previous blogs here: Blog 1: Journey into Visual AI: Exploring FiftyOne Together — Part I Introduction. Blog 2: Journey into Visual AI: Exploring FiftyOne Together — Part II Getting Started In this Blog 3, we’ll explore the new ElderlyAction Recognition Challenge I’m working on, its goals, the challenges we face, and how the open-source community can collaborate to address them. At the end of this blog, I hope you are interested in participating in the challenge and bringing your ideas to AI for Good. From the early days of my professional career, I’ve been passionate about applications for automated systems. In recent years, my focus has naturally gravitated toward cutting-edge AI trends. Yet, despite the advancements, many unresolved challenges remain. I vividly remember working during my master’s degree on a system designed to detect falls in the elderly. The idea involved developing a sensor-based belt that activated an inflatable device to prevent injuries. Such concepts have gone over the years, and companies now market similar solutions. However, with the rise of computer vision and robotics to assist humans in their daily lives, we face a new challenge: leveraging camera-based technology to detect human actions. I recall my first blog with OpenVINO, “Human Action Recognition,” where I implemented an encoder-decoder architecture to generate embeddings from 16 frames and determine actions captured in videos. You cannot miss that notebook—I have my son in there, lovely!

Paula and Paula’s son in human action recognition.

Since then, models have evolved dramatically, with new architectures released nearly every week. This rapid evolution in model development begs the question: Can we generate reliable data at a pace that matches this rate of innovation?

What Is the Elderly Action Recognition Challenge?

This challenge aims to tackle one of the most critical applications in human action recognition: identifying activities of daily living (ADLs) and fall detection for the elderly. The competition invites participants to train models on a significant, generic benchmark of human action recognition and apply transfer learning using a subset of data and class labels specific to elderly-related actions. Key Details:

Note: This challenge is part of the Computer Vision for Smalls Workshop (CV4Smalls) hosted in WACV 2025.

Video from Video capture of the circumstances of falls in elderly people residing in long-term care: an observational study

Human Action Recognition in the Era of Vision Transformers

The development of models for human action recognition has significantly transformed in the era of Vision Transformers (ViTs). While convolutional architectures laid the foundation, ViTs have introduced a new paradigm with their ability to effectively capture long-range dependencies and process spatiotemporal data.

However, this challenge seeks a solution that doesn’t necessarily rely on Vision Transformers. It is open to different approaches, even the more simplistic ones, emphasizing the solution's practicality and accessibility rather than exclusively adopting cutting-edge architectures. Data complexity and model generalization are the main challenges in model development and deployment. Handling spatiotemporal data is resource-intensive and demands robust architectures and achieving high accuracy across diverse datasets remains a challenge.

Regarding data creation challenges, unfortunately, data creation does not have the same rate of model development, and the available data is still too restrictive. Open-access datasets for elderly action recognition are limited, presenting challenges for reproducibility and benchmarking.

Current Data for Detecting ADLs and Falls in the Elderly

The availability of open-access datasets for elderly action recognition is a critical bottleneck. Most existing datasets have limitations in scale, diversity, or licensing. The key issues I can identify after preparing this material for potential participants of the challenge are:

The Role of FiftyOne in Video Data Management

As you can see in my previous blogs, FiftyOne is a powerful open-source tool for handling and analyzing data. The new aspect of this blog is that FiftyOne can also process video data, offering critical functionality for dataset curation and exploration in complex datasets. With FiftyOne, we can create video datasets and streamline importing, organizing, and visualizing video data. Managing the metadata easily manages metadata associated with datasets, enabling better insights and analysis. It also explores the data curation tools, efficiently visualizing, cleaning, filtering, and curating video datasets, ensuring high-quality inputs for model training.

Here, you can find extra resources for video management with FiftyOne:

Getting Hands-On: Exploring ADL and Fall Detection Datasets

For this demonstration, we’ll dive into the GMNCSA2024 dataset, which provides a comprehensive collection of elderly activity and fall detection videos. Dataset Description:

Using FiftyOne, we’ll navigate this dataset, showcasing how to explore its structure, visualize key insights, and prepare it for training robust AI models.

Step 1 – Defining Path for Dataset and Checking if Dataset Exists:

After installing the required libraries and importing the necessary modules, the first step is to define the dataset path and create a new dataset. To avoid conflicts with previous executions, we first check if a dataset with the same name already exists. If it does, we delete it to start fresh.

Step 2 – Setting up helper functions:

To process the dataset effectively, we define two key helper functions:

2.1 Function to Parse the Classes

This function extracts action names and their respective time ranges from the dataset. Since each video can include multiple actions, the label file specifies which actions occur at specific timestamps. We use this information to split videos into smaller clips and prepare a new dataset based on these segments.

2.2 Function to Map Actions to Categories

One of the goals of the challenge is to categorize actions. This function maps each action to a predefined category to ensure the action recognition task also includes a higher-level classification.

Step 3 – Iteration in the Main Folders, Per Subject, and Splitting Video by Actions Using FiftyOne.

This section combines several important tasks: Adding Samples to the Dataset: We read the dataset from CSV files, extract metadata (e.g., file name, action, and description), and add these as new samples. We also enrich the metadata by adding fields like subject, type_of_activity (e.g., ADL or Fall), and categories derived from the actions. Splitting Videos into Clips: We split videos into smaller clips using the parsed action information for each specific action. This is achieved by creating a metadata field called events, which stores the timestamps and frames corresponding to each action. Exporting the Dataset: The updated dataset can be exported into a FiftyOne format or a Classification Directory Tree after processing. The latter option is especially useful for working with split clips instead of full videos.

Step 4 — Launch the APP

Once the dataset is prepared, you can interact with it programmatically by launching the FiftyOne app. This allows you to explore the dataset visually, create views, and export those views to various formats for further analysis or sharing. The FiftyOne app provides a highly interactive way to:

After launching the app, I can see that my new metadata and events are on the left side of the menu, along with all the metadata of the dataset, which I successfully added through the code I shared below and in the notebook.

Shuffle (51), Checking new metadata and sample[“events”]

Step 5 — Exporting Clips and Single Actions

Using TemporalDetections, we can focus on specific ranges of frames within the original videos, corresponding to individual actions. The events field in the metadata marks these individual events with precise timestamps, enabling clear segmentation.

Selecting just a particular event in the metadata “sitting” and checking for other events in the original video.

After this process, we can export only the relevant clips and single actions instead of entire videos with complex labels. This streamlined dataset structure is ideal for training machine learning models or for submission to challenges that require precise action recognition.

By isolating and exporting these segments, we reduce dataset size and improve clarity and usability for downstream tasks.

FiftyOne can manage different kinds of datasets; in this notebook, we used a custom dataset and added each sample to the dataset. It is time to export this to use more of FiftyOne's capabilities. For more information about which datasets FiftyOne can manage, take a look at this page.

Additional resources:

Just wrapping up! 😀

Thank you for joining me in exploring the Elderly Action Recognition Challenge and the powerful tools FiftyOne provides for dataset preparation and video data management. We have learned how to define a complex dataset to launch the FiftyOne app and export actionable clips. We’ve seen how FiftyOne streamlines the complexities of handling video datasets.

I invite you to participate in the challenge, test the notebook shared in this blog, and share your experience with FiftyOne.

I would love to hear about your experiences! Please Share Your Thoughts, Ask Questions, and Provide Testimonials. Your insights might help others in our next posts. Don’t forget to participate in the challenge and try out the notebook I have created for you all.

Together, we can innovate in action recognition and make meaningful contributions to AI for Good. Let’s build something impactful!

Stay tuned for the next post, in which we’ll explore FiftyOne’s advanced and evaluate the model.

Let’s make this journey with FiftyOne a collaborative and enriching experience. Happy coding!

Stay Connected:

What is next?

I’m excited to share more about my journey at Voxel51! 🚀 If you’d like to follow along as I explore the world of AI and grow professionally, feel free to connect or follow me on LinkedIn. Let’s inspire each other to embrace change and reach new heights!

You can find me at some Voxel51 events (https://voxel51.com/computer-vision-events/), or if you want to join this amazing team, it’s worth taking a look at this page: https://voxel51.com/jobs/

Talk to a computer vision expert