Spatio-Temporal Action Detection and Localization Using a Hierarchical LSTM

Akshaya Ramaswamy, Karthik Seemakurthy, Jayavardhana Gubbi, Balamuralidhar Purushothaman; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 764-765

Abstract


Video analysis is gaining importance in the recent past due to its usefulness in a wide variety of applications. The efficiency of a video analytics engine primarily depends on its ability to extract the spatio-temporal features, which has enough discriminative. Inspired by the way the human visual system operates, we propose a hierarchical architecture to capture the spatio-temporal information from a given input video at different time scales. The proposed architecture has a 3D Inception module followed by two layers of modified Convolutional Long Short Term Memory (ConvLSTM) as the fundamental unit. At each level, we consolidate the LSTM cell and hidden states to the next level by using an visual attention-based pooling approach. The proposed network is used for video action detection and localization application that is the foundational element for video analysis. UCF101 and AVA datasets are used to show that the recognition accuracy achieved by the proposed algorithm advances the state-of-the-art in spatio-temporal action detection and localization application.

Related Material


[pdf]
[bibtex]
@InProceedings{Ramaswamy_2020_CVPR_Workshops,
author = {Ramaswamy, Akshaya and Seemakurthy, Karthik and Gubbi, Jayavardhana and Purushothaman, Balamuralidhar},
title = {Spatio-Temporal Action Detection and Localization Using a Hierarchical LSTM},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}
}