Action Recognition From RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies

Maryam Asadi-Aghbolaghi, Hugo Bertiche, Vicent Roig, Shohreh Kasaei, Sergio Escalera; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3179-3188

Abstract


In this work, multimodal fusion of RGB-D data are analyzed for action recognition by using scene flow as early fusion and integrating the results of all modalities in a late fusion fashion. Recently, there is a migration from traditional handcrafting to deep learning. However, handcrafted features are still widely used owing to their high performance and low computational complexity. In this research, Multimodal dense trajectories (MMDT) is proposed to describe RGB-D videos. Dense trajectories are pruned based on scene flow data. Besides, 2DCNN is extended to multimodal (MM2DCNN) by adding one more stream (scene flow) as input and then fusing the output of all models. We evaluate and compare the results from each modality and their fusion on two action datasets. The experimental result shows that the new representation improves the accuracy. Furthermore, the fusion of handcrafted and learning-based features shows a boost in the final performance, achieving state of the art results.

Related Material


[pdf]
[bibtex]
@InProceedings{Asadi-Aghbolaghi_2017_ICCV,
author = {Asadi-Aghbolaghi, Maryam and Bertiche, Hugo and Roig, Vicent and Kasaei, Shohreh and Escalera, Sergio},
title = {Action Recognition From RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}