FrameExit: Conditional Early Exiting for Efficient Video Recognition

Ghodrati, Amir; Bejnordi, Babak Ehteshami; Habibian, Amirhossein

Amir Ghodrati, Babak Ehteshami Bejnordi, Amirhossein Habibian; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15608-15618

Abstract

In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to reduce the computation costs, we propose to use a simple sampling strategy combined with conditional early exiting to enable efficient recognition. Our model automatically learns to process fewer frames for simpler videos and more frames for complex ones. To achieve this, we employ a cascade of gating modules to automatically determine the earliest point in processing where an inference is sufficiently reliable. We generate on-the-fly supervision signals to the gates to provide a dynamic trade-off between accuracy and computational cost. Our proposed model outperforms competing methods on three large-scale video benchmarks. In particular, on ActivityNet1.3 and mini-kinetics, we outperform the state-of-the-art efficient video recognition methods with 1.3x and 2.1x less GFLOPs, respectively. Additionally, our method sets a new state of the art for efficient video understanding on the HVU benchmark.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Ghodrati_2021_CVPR, author = {Ghodrati, Amir and Bejnordi, Babak Ehteshami and Habibian, Amirhossein}, title = {FrameExit: Conditional Early Exiting for Efficient Video Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {15608-15618} }