YOLSE: Egocentric Fingertip Detection From Single RGB Images

Wenbin Wu, Chenyang Li, Zhuo Cheng, Xin Zhang, Lianwen Jin; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 623-630


With the development of wearable device and augmented reality (AR), the human device interaction in egocentric vision, especially the hand gesture based interaction, has attracted lots of attention among computer vision researchers. In this paper, we build a new dataset named EgoGesture and propose a heatmap-based solution for fingertip detection. Firstly, we discuss the dataset collection detail and as well the comprehensive analysis of this dataset, which shows that the dataset covers substantial data samples in various environments and dynamic hand shapes. Furthermore, we propose a heatmap-based FCN (Fully Convolution Network) named YOLSE (You Only Look what You Should See) for fingertip detection in the egocentric vision from single RGB image. The fingermap is the proposed new probabilistic representation for the multiple fingertip detection, which not only shows the location of fingertip but also indicates whether the fingertip is visible. Comparing with state-of-the-art fingertip detection algorithms, our framework performs the best with limited dependence on the hand detection result. In our experiments, we achieve the fingertip detection error at about 3.69 pixels in 640px x 480px video frame and the average forward time of the YOLSE is about 15.15 ms.

Related Material

author = {Wu, Wenbin and Li, Chenyang and Cheng, Zhuo and Zhang, Xin and Jin, Lianwen},
title = {YOLSE: Egocentric Fingertip Detection From Single RGB Images},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}