Normalized Human Pose Features for Human Action Video Alignment

Jingyuan Liu, Mingyi Shi, Qifeng Chen, Hongbo Fu, Chiew-Lan Tai; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11521-11531

Abstract


We present a novel approach for extracting human pose features from human action videos. The goal is to let the pose features capture only the poses of the action while being invariant to other factors, including video backgrounds, the video subject's anthropometric characteristics and viewpoints. Such human pose features facilitate the comparison of pose similarity and can be used for down-stream tasks, such as human action video alignment and pose retrieval. The key to our approach is to first normalize the poses in the video frames by retargeting the poses onto a pre-defined 3D skeleton to not only disentangle subject physical features, such as bone lengths and ratios, but also to unify global orientations of the poses. Then the normalized poses are mapped to a pose embedding space of high-level features, learned via unsupervised metric learning. We evaluate the effectiveness of our normalized features both qualitatively by visualizations, and quantitatively by a video alignment task on the Human3.6M dataset and an action recognition task on the Penn Action dataset.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Liu_2021_ICCV, author = {Liu, Jingyuan and Shi, Mingyi and Chen, Qifeng and Fu, Hongbo and Tai, Chiew-Lan}, title = {Normalized Human Pose Features for Human Action Video Alignment}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {11521-11531} }