Large-Scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks

Huogen Wang, Pichao Wang, Zhanjie Song, Wanqing Li; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3138-3146

Abstract


This paper presents an effective method for continuous gesture recognition. The method consists of two modules: segmentation and recognition. In the segmentation module, a continuous gesture sequence is segmented into isolated gesture sequences by classifying the frames into gesture frames and transitional frames using two stream convolutional neural networks. In the recognition module, our method exploits the spatiotemporal information embedded in RGB and depth sequences. For the depth modality, our method converts a sequence into Dynamic Images and Motion Dynamic Images through rank pooling and input them to Convolutional Neural Networks respectively. For the RGB modality, our method adopts Convolutional LSTM Networks to learn long-term spatiotemporal features from short-term spatiotemporal features obtained by a 3D convolutional neural network. Our method has been evaluated on ChaLearn LAP Large-scale Continuous Gesture Dataset and achieved the state-of-the-art performance.

Related Material


[pdf]
[bibtex]
@InProceedings{Wang_2017_ICCV,
author = {Wang, Huogen and Wang, Pichao and Song, Zhanjie and Li, Wanqing},
title = {Large-Scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}