What Elements are Essential to Recognize Human Actions?

Yachun Li, Yong Liu, Chi Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 56-65


RGB image has been widely used for human action recognition. However, it could be redundant to include all information for human action depiction. We thus ask the following question: What elements are essential for human action recognition? To this end, we investigate several different human representations. These representations emphasize dissimilarly on elements (e.g. background context, actor appearance, and human shape). Systematic analysis enables us to find out essential elements as well as unnecessary contents for human action description. More specifically, our experimental results demonstrate the following: Firstly, both context-related elements and actor appearance are not vital for action recognition in most cases. But an accurate and consistent human representation is important. Secondly, essential human representation ensures better performance and cross-dataset transferability. Thirdly, fine-tuning works only when networks acquire essential elements from human representations. Fourthly, 3D reconstruction-related representation is beneficial for human action recognition tasks. Our study shows researchers need to reflect on more essential elements to depict human actions, and it is also instructive for practical human action recognition in real-world scenarios.

