DeePoint: Visual Pointing Recognition and Direction Estimation

Shu Nakamura, Yasutomo Kawanishi, Shohei Nobuhara, Ko Nishino; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 20577-20587

Abstract


In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Nakamura_2023_ICCV, author = {Nakamura, Shu and Kawanishi, Yasutomo and Nobuhara, Shohei and Nishino, Ko}, title = {DeePoint: Visual Pointing Recognition and Direction Estimation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {20577-20587} }