Sign Language Translation With Hierarchical Spatio-Temporal Graph Neural Network

Jichao Kan, Kun Hu, Markus Hagenbuchner, Ah Chung Tsoi, Mohammed Bennamoun, Zhiyong Wang; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 3367-3376

Abstract


Sign language translation (SLT), which generates text in a spoken language from visual content in a sign language, is important to assist the hard-of-hearing community for their communications. Inspired by neural machine translation (NMT), most existing SLT studies adopt a general sequence to sequence learning strategy. However, SLT is significantly different from conventional NMT tasks since sign languages convey messages through multiple aspects simultaneously such as hand poses, relative positions and body movements. Therefore, in this paper, the unique characteristics of the signing poses of sign languages is utilized to formulate hierarchical spatio-temporal graph representations of signing poses, including both high-level and fine-level graphs of which each vertex characterizes a specified body part and the edges represent the interactions between any two vertices. Specifically, high-level graphs represent the interactions between key regions such as hands and face, and fine-level graphs represent relationships between the joints of each hand and landmarks of facial regions. To this end, a novel deep learning architecture, namely hierarchical spatio-temporal graph neural network (HST-GNN), is proposed to learn such graph representations. In addition, graph convolutions and graph self-attentions with neighborhood context are proposed to characterize both the local and the global graph properties. Experimental results on benchmark datasets demonstrated the the performance.

Related Material


[pdf]
[bibtex]
@InProceedings{Kan_2022_WACV, author = {Kan, Jichao and Hu, Kun and Hagenbuchner, Markus and Tsoi, Ah Chung and Bennamoun, Mohammed and Wang, Zhiyong}, title = {Sign Language Translation With Hierarchical Spatio-Temporal Graph Neural Network}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2022}, pages = {3367-3376} }