Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition

Jian Liu, Naveed Akhtar, Ajmal Mian; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 10-19

Abstract


Existing human joint representations do not fully exploit the learning power of Convolutional Neural Networks (CNNs). We propose a representation for skeleton joint sequences that is both spatial and spatio-temporal with respect to the receptive fields of convolution kernels of CNN to facilitate learning from spacial locations of the joints as well as their transitions over time. Our representation allows for better hierarchical learning by CNNs as we transform skeleton sequences into images of flexible dimensions encoding rich spatial and spatio-temporal information about the joints by maximizing a unique distance metric, defined collaboratively over the distinct joint arrangements. Our representation additionally encodes the relative joint velocities. The proposed action recognition exploits the representation in a hierarchical manner by first capturing the micro-temporal relations between the skeleton joints using CNN and then exploiting their macro-temporal relations by computing the Fourier Temporal Pyramids. We ex- tend the Inception-ResNet CNN architecture with the pro- posed method and improve the state-of-the-art accuracy by 4.4% on the large scale NTU human activity dataset. On NUCLA and UTD-MHAD datasets, our method outperforms the existing results by 5.7% and 9.3% respectively.

Related Material


[pdf] [dataset]
[bibtex]
@InProceedings{Liu_2019_CVPR_Workshops,
author = {Liu, Jian and Akhtar, Naveed and Mian, Ajmal},
title = {Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}