Isolated Sign Language Recognition With Multi-Scale Spatial-Temporal Graph Convolutional Networks

Vazquez-Enriquez, Manuel; Alba-Castro, Jose L.; Docio-Fernandez, Laura; Rodriguez-Banga, Eduardo

Manuel Vazquez-Enriquez, Jose L. Alba-Castro, Laura Docio-Fernandez, Eduardo Rodriguez-Banga; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 3462-3471

Abstract

Isolated Sign Language Recognition (ISLR) fits nicely in the domain of problems that can be handled by graph-structured spatial-temporal algorithms. A recent multiscale spatial-temporal graph convolution operator, MS-G3D, takes advantage of the semantic connectivity among non-neighbor nodes of the graph in a flexible temporal scale, which results in improved performance in classical Human Action Recognition datasets. In this work, we present a solution for ISLR using a skeleton graph that includes body and finger joints and makes use of this specific property of MS-G3D, which seems crucial to capture the internal relationship among semantically connected distant nodes in sign language dynamics. To complete the analysis, we compare the results with a 3D-CNN architecture, S3D, already used for SLR, and fuse it with MS-G3D. The performance achieved on the AUTSL dataset shows that MS-G3D alone stands out as a viable technique for ISLR. In fact, the improvement after fusing with a 3D-CNN approach, at least on this medium-scale dataset, appears marginal. The transfer learning capability of the trained models is also explored using pre-training with the larger WLASL dataset and post-training with the smaller LSE UVIGO dataset. The classification performance based on the MS-G3D model over AUTSL does not benefit from pre-training with WLASL, but the performance on the more similarly acquired LSE UVIGO dataset improves significantly from fine-tuning the MS-G3D AUTSL model.

Related Material

[pdf]

[bibtex]

@InProceedings{Vazquez-Enriquez_2021_CVPR, author = {Vazquez-Enriquez, Manuel and Alba-Castro, Jose L. and Docio-Fernandez, Laura and Rodriguez-Banga, Eduardo}, title = {Isolated Sign Language Recognition With Multi-Scale Spatial-Temporal Graph Convolutional Networks}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {3462-3471} }