Stop or Forward: Dynamic Layer Skipping for Efficient Action Recognition

Jonghyeon Seon, Jaedong Hwang, Jonghwan Mun, Bohyung Han; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3361-3370

Abstract


One of the challenges for analyzing video contents (e.g., actions) is high computational cost, especially for the tasks that require processing densely sampled frames in a long video. We present a novel efficient action recognition algorithm, which allocates computational resources adaptively to individual frames depending on their relevance and significance. Specifically, our algorithm adopts LSTM-based policy modules and sequentially estimates the usefulness of each frame based on their intermediate representations. If a certain frame is unlikely to be helpful for recognizing actions, our model stops forwarding the features to the rest of the layers and starts to consider the next sampled frame. We further reduce the computational cost of our approach by introducing a simple yet effective early termination strategy during the inference procedure. We evaluate the proposed algorithm on three public benchmarks: ActivityNet-v1.3, Mini-Kinetics, and THUMOS'14. Our experiments show that the proposed approach achieves outstanding trade-off between accuracy and efficiency in action recognition.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Seon_2023_WACV, author = {Seon, Jonghyeon and Hwang, Jaedong and Mun, Jonghwan and Han, Bohyung}, title = {Stop or Forward: Dynamic Layer Skipping for Efficient Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {3361-3370} }