-
[pdf]
[bibtex]@InProceedings{le_2024_ACCV, author = {le, Thi-Lan and Le, Viet-Duc and Nguyen, Thuy-Binh}, title = {Enhancing Continuous Skeleton-Based Human Gesture Recognition by Incorporating Text Descriptions}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV) Workshops}, month = {December}, year = {2024}, pages = {605-620} }
Enhancing Continuous Skeleton-Based Human Gesture Recognition by Incorporating Text Descriptions
Abstract
Continuous gesture recognition is a crucial task in humancomputer interaction. Unlike isolated gesture recognition, where individual gestures are analyzed independently, continuous recognition involves detecting and classifying multiple gestures seamlessly from continuous video streams. In this paper, we propose a method for continuous gesture recognition. Our proposed model operates in two stages: isolated gesture recognition and a sliding window-based approach for continuous gesture recognition. For isolated gesture recognition, we propose a dual encoder method named TDDNet, stand for Text-Enhanced DDNet, that integrates a skeleton encoder based on the DDNet model [7] with a text encoder based on CLIP.We evaluate our model on a self-collected dataset comprising 19 gestures relevant to human-COBOT interaction, collected from 50 subjects. Experimental results demonstrate that our model improves isolated gesture recognition accuracy from 84.2% to 85.5%, while for continuous gesture recognition, the model achieves a performance of 66.60%, compared to 66.00% of the baseline model. The source code is publicly available at https://github.com/duclvQ/improved_DDNet.
Related Material