Submission Deadline: February 15, 2025
Join us for the Elderly Action Recognition (EAR) Challenge, part of the Computer Vision for Smalls (CV4Smalls) Workshop at the WACV 2025 conference!
This challenge focuses on advancing research in Activity of Daily Living (ADL) recognition, particularly within the elderly population, a domain with profound societal implications. Participants will employ transfer learning techniques with any architecture or model they want to use. For example, starting with a general human action recognition benchmark and fine-tuning models on a subset of data tailored to elderly-specific activities.
We warmly invite participants from both academia and industry to collaborate and innovate. Voxel51 is proudly sponsoring this challenge and aims to encourage solutions that demonstrate robustness across varying conditions (e.g., subjects, environments, scenes) and adaptability to real-world variability.
Participants will:
Participants are encouraged to explore and leverage state-of-the-art human action recognition models without limitations. Creativity and originality in model architecture and training methodology are strongly encouraged.
To mitigate overfitting and promote generalization, participants could build training datasets using publicly available resources such as ETRI-Activity3D [1], MUVIM [2], and ToyotaSmartHome [3], based on video format. Consider that there are datasets that require submitting a request before you can be downloaded.
Participants are not restricted to these datasets and are welcome to curate extensive datasets, combining multiple sources. A detailed report outlining the datasets used and their preparation and curation processes is mandatory.
The evaluation dataset will comprise multiple curated subsets of elderly action recognition data. Participants could access the unlabeled evaluation dataset. Here is the evaluation dataset you need to download to test your model.
Participants must group activities into the following categories for efficient organization and analysis, model output should show categories and activities as well.
Keep in mind that the submission file should have the following categories, all lowercase: “`locomotion, manipulation, hygiene, eating, communication, and leisure.“`
Voxel51’s FiftyOne tool will assist participants in effectively curating, categorizing, and visualizing datasets. Tutorials and examples will be provided at the challenge launch.
A few supporting blogs and notebooks:
Given a path to mp4, the evaluation script should intake the video and output category. The evaluation framework will use the following metrics to ensure a fair and comprehensive assessment:
An evaluation script will calculate all metrics and provide a leaderboard ranking. The script will not be accessible during training. In the event of tied scores, secondary metrics will determine leaderboard positions.
First place:
First, second, and third place:
Next Steps After the Challenge
[1] Jang, J., Kim, D., Park, C., Jang, M., Lee, J., & Kim, J. (2020). ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly. arXiv:2003.01920
[2] Denkovski, S., Khan, S. S., Malamis, B., Moon, S. Y., Ye, B., & Mihailidis, A. (2022). Multi Visual Modality Fall Detection Dataset. arXiv:2206.12740
[3] R. Dai, S. Das, S. Sharma, L. Minciullo, L. Garattoni, F. Bremond, and G. Francesca, Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection, arXiv preprint arXiv:2010.14982, 2022. [Online]. Available: https://arxiv.org/abs/2010.14982
In an era dominated by large-scale datasets, designing computer vision solutions for unique and often underrepresented populations demands rethinking technology and methodology. Infants, toddlers, and the elderly represent groups where data is scarce due to ethical, practical, and privacy concerns but where the potential for positive impact is immense. This keynote will explore the core challenges of building and deploying computer vision systems with limited yet precious data. We will see how approaches involving synthetic data, GenAI, stereo vision, and even models built using big data can immensely help in solutions where data is sparse, and privacy is critical.
CEO – OpenCV.org
Northeastern University
Voxel51
University of Central Florida
Northeastern University
Northeastern University
Northeastern University