Skip to content

Elderly Action Recognition Challenge - WACV 2025

Submission Deadline: Feb 15, 2025

Register to recieve additional details, updates and exclusive swag related to the challenge.

By submitting you (1) agree to Voxel51’s Terms of Service and Privacy Statement and (2) agree to receive occasional emails.

Elderly Action Recognition Challenge - WACV 2025

Join us for the Elderly Action Recognition (EAR) Challenge, part of the Computer Vision for Smalls (CV4Smalls) Workshop at the WACV 2025 conference!

This challenge focuses on advancing research in Activity of Daily Living (ADL) recognition, particularly within the elderly population, a domain with profound societal implications. Participants will employ transfer learning techniques with any architecture or model they want to use. For example, starting with a general human action recognition benchmark and fine-tuning models on a subset of data tailored to elderly-specific activities.

We warmly invite participants from both academia and industry to collaborate and innovate. Voxel51 is proudly sponsoring this challenge and aims to encourage solutions that demonstrate robustness across varying conditions (e.g., subjects, environments, scenes) and adaptability to real-world variability.

Challenge Objectives

Participants will:

  • Develop state-of-the-art models for elderly action recognition using publicly available datasets.
  • Showcase innovative techniques for data curation and transfer learning.
  • Contribute to a growing body of research that addresses real-world challenges in elderly care.

Key Dates

  • Challenge Launch: January 10, 2025
  • Submission Deadline: February 15, 2025
  • Winners Announcement: February 20, 2025, at the virtual AI, Machine Learning, and Computer Vision Meetup.

Model Development

Participants are encouraged to explore and leverage state-of-the-art human action recognition models without limitations. Creativity and originality in model architecture and training methodology are strongly encouraged.

Dataset

To mitigate overfitting and promote generalization, participants could build training datasets using publicly available resources such as ETRI-Activity3D [1], MUVIM [2], and ToyotaSmartHome [3], based on video format. Consider that there are datasets that require submitting a request before you can be downloaded.

Participants are not restricted to these datasets and are welcome to curate extensive datasets, combining multiple sources. A detailed report outlining the datasets used and their preparation and curation processes is mandatory.

Data Curation and Categorization

Participants must group activities into the following categories for efficient organization and analysis, model output should show categories and activities as well.

  • Locomotion and Posture Transitions – Walking, sitting down/standing up, getting up/lying down, exercising, looking for something
  • Object Manipulation – Spreading bedding/folding bedding, wiping table, cleaning dishes, cooking, vacuuming
  • Hygiene and Personal Care – Washing hands, brushing teeth, taking medicine
  • Eating and Drinking
  • Communication and Gestures – talking, phone calls, waving a hand, shaking hands, hugging
  • Leisure and Stationary Actions – Reading, watching TV

Supporting Materials

Voxel51’s FiftyOne tool will assist participants in effectively curating, categorizing, and visualizing datasets. Tutorials and examples will be provided at the challenge launch.

Evaluation

Given a path to mp4, the evaluation script should intake the video and output category + label. The evaluation framework will use the following metrics to ensure a fair and comprehensive assessment:

  • Primary Metric: Average Accuracy across the evaluation datasets
  • Secondary Metrics: Precision, Recall, F1-Score, and confusion matrix insights

Evaluation Script

An evaluation script will calculate all metrics and provide a leaderboard ranking. The script will not be accessible during training. In the event of tied scores, secondary metrics will determine leaderboard positions.

Participation Rules

  • Submissions must include
    • An Eval submission CSV file, with the prediction results over the Evaluation Dataset
    • A Hugging Face Link of your PyTorch model weights
    • A PDF Report documenting the data curation process and datasets used.
  • Use of external datasets or pre-trained models is permitted but must be disclosed in the report. User can leverage any architecture or model they want
  • Submissions must include the required files (eval cvs file, model link, and report) or be disqualified.
  • The top five positions on the leaderboard may be subject to audit by the challenge organizers. If you are among the top five, you will receive further communication regarding the process.
  • Using the Discord channel is not mandatory, but it will benefit all the participants, help each other, or help them find common problems.
  • More information will be provided once the submission process is ready.

Submission Platform

Stay tuned for additional details coming shortly!

Submission Process

Stay tuned for additional details coming shortly!

References

[1] Jang, J., Kim, D., Park, C., Jang, M., Lee, J., & Kim, J. (2020). ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly. arXiv:2003.01920

[2] Denkovski, S., Khan, S. S., Malamis, B., Moon, S. Y., Ye, B., & Mihailidis, A. (2022). Multi Visual Modality Fall Detection Dataset. arXiv:2206.12740

[3] R. Dai, S. Das, S. Sharma, L. Minciullo, L. Garattoni, F. Bremond, and G. Francesca, Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection, arXiv preprint arXiv:2010.14982, 2022. [Online]. Available: https://arxiv.org/abs/2010.14982

Judges/Mentors

Sarah Ostadabbas
Northeastern University

Paula Ramos, PhD
Voxel51

Chen Chen
University of Central Florida

Yanjun Zhu
Northeastern University

Shayda Moezzi
Northeastern University

Somaieh Amraee
Northeastern University