3DInAction: Understanding Human Actions in 3D Point Clouds

Ben-Shabat, Yizhak; Shrout, Oren; Gould, Stephen

Yizhak Ben-Shabat, Oren Shrout, Stephen Gould; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 19978-19987

Abstract

We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years however its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality---lack of structure permutation invariance and varying number of points---which makes it difficult to learn a spatio-temporal representation. To address this limitation we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets including DFAUST and IKEA ASM. Code is publicly available at https://github.com/sitzikbs/3dincaction

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Ben-Shabat_2024_CVPR, author = {Ben-Shabat, Yizhak and Shrout, Oren and Gould, Stephen}, title = {3DInAction: Understanding Human Actions in 3D Point Clouds}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {19978-19987} }