-
[pdf]
[supp]
[bibtex]@InProceedings{Yang_2024_ACCV, author = {Yang, Yifan and Min, Yuecong and Chen, Xilin}, title = {S2Net: Skeleton-aware SlowFast Network for Efficient Sign Language Recognition}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {319-336} }
S2Net: Skeleton-aware SlowFast Network for Efficient Sign Language Recognition
Abstract
Continuous Sign Language Recognition (CSLR) aims to interpret meaning from signers' postures and movements. Joint-wise correspondences between estimated skeleton data and sign videos provide complementary insights into appearance and motion. In this paper, we propose a Skeleton-aware SlowFast Network(S^2Net) to effectively capture the appearance and motion information in sign videos. S^2Net leverages skeleton data in the fast pathway and video data in the slow pathway, progressively integrating both streams of information. Initially, we project both skeleton and video data into a unified graph-structured space and employ a consistent GCN-based architecture for both pathways, then we propose a group-wise cross-attention module to fuse intermediate features between different pathways. Finally, a frame-wise fusion pathway is adopted to integrate the semantic information at the sequence level. Experimental results on three public datasets demonstrate the effectiveness and efficiency of the proposed method.
Related Material