Multimodal Gesture Recognition Based on the ResC3D Network

Qiguang Miao, Yunan Li, Wanli Ouyang, Zhenxin Ma, Xin Xu, Weikang Shi, Xiaochun Cao; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3047-3055


Gesture recognition is an important issue in computer vision. Recognizing gestures with videos remains a challenging task due to the barriers of gesture-irrelevant factors. In this paper, we propose a multimodal gesture recognition method based on a ResC3D network. One key idea is to find a compact and effective representation of video sequences. Therefore, the video enhancement techniques, such as Retinex and median filter are applied to eliminate the illumination variation and noise in the input video, and a weighted frame unification strategy is utilized to sample key frames. Upon these representations, a ResC3D network, which leverages the advantages of both residual and C3D model, is developed to extract features, together with a canonical correlation analysis based fusion scheme for blending features. The performance of our method is evaluated in the Chalearn LAP isolated gesture recognition challenge. It reaches 67.71% accuracy and ranks the 1st place in this challenge.

Related Material

author = {Miao, Qiguang and Li, Yunan and Ouyang, Wanli and Ma, Zhenxin and Xu, Xin and Shi, Weikang and Cao, Xiaochun},
title = {Multimodal Gesture Recognition Based on the ResC3D Network},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}