Gesture and Sign Language Recognition With Temporal Residual Networks

Lionel Pigou, Mieke Van Herreweghe, Joni Dambre; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3086-3093

Abstract


Gesture and sign language recognition in a continuous video stream is a challenging task, especially with a large vocabulary. In this work, we approach this as a framewise classification problem. We tackle it using temporal convolutions and recent advances in the deep learning field like residual networks, batch normalization and exponential linear units (ELUs). The models are evaluated on three different datasets: the Dutch Sign Language Corpus (Corpus NGT), the Flemish Sign Language Corpus (Corpus VGT) and the ChaLearn LAP RGB-D Continuous Gesture Dataset (ConGD). We achieve a 73.5% top-10 accuracy for 100 signs with the Corpus NGT, 56.4% with the Corpus VGT and a mean Jaccard index of 0.316 with the ChaLearn LAP ConGD without the usage of depth maps.

Related Material


[pdf]
[bibtex]
@InProceedings{Pigou_2017_ICCV,
author = {Pigou, Lionel and Van Herreweghe, Mieke and Dambre, Joni},
title = {Gesture and Sign Language Recognition With Temporal Residual Networks},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}