Video Action Recognition with Adaptive Zooming Using Motion Residuals

Mostafa Shahabinejad, Irina Kezele, Seyed Shahabeddin Nabavi, Wentao Liu, Seel Patel, Yuanhao Yu, Yang Wang, Jin Tang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 1214-1223

Abstract


Motivated by the mechanisms of selective visual attention in humans, we put forward an efficient method for learning spatial attention with adaptive zooming for video action recognition. The learnt module can be used as a plug-in with any 3D CNN action recognition model with clip-level processing. We propose to use relevant motion clues from video frames to adaptively learn input-clip optimal transformations, as these clues are hypothesized to be directly related to the action recognition task. We employ differentiable transformations and samplers and ensure end-to-end system differentiability. We render the proposed module light-weight and computationally efficient, by exploiting the motion information inherently present in compressed videos and readily available at both training and inference time. Highly informative motion-related content of compressed video domain modalities helps further boost action recognition accuracy. Our experimental work demonstrates clear benefits of the proposed method for adaptive spatial zooming and of utilizing the compressed domain for that purpose.

Related Material


[pdf]
[bibtex]
@InProceedings{Shahabinejad_2023_ICCV, author = {Shahabinejad, Mostafa and Kezele, Irina and Nabavi, Seyed Shahabeddin and Liu, Wentao and Patel, Seel and Yu, Yuanhao and Wang, Yang and Tang, Jin}, title = {Video Action Recognition with Adaptive Zooming Using Motion Residuals}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {1214-1223} }