ANT: Adapt Network Across Time for Efficient Video Processing

Feng Liang, Ting-Wu Chin, Yang Zhou, Diana Marculescu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 2603-2608

Abstract


Abundant redundancies exist in video streams, thereby pointing to opportunities to save computations. Towards this end, we propose the Adaptive Network across Time (ANT) framework to harness these redundancies for reducing the computational cost of video processing. Unlike most dynamic networks that adapt their structures to different static inputs, our method adapts networks along the temporal dimension. By inspecting the semantic differences between frames, the proposed ANT chooses a purpose-fit network at test time to reduce overall computation, i.e., switching to a smaller network when observing mild differences. The proposed ANT adapts the structured networks within a supernet, making it hardware-friendly and therefore achieves actual acceleration in real-world scenarios. The proposed ANT is powered by (1). a fusion module that utilizes the past features and (2). a dynamic gate to adjust the network in a predictive fashion with negligible extra cost. To ensure the generality of each subnet and the gate's fairness, we propose a two-stage training scheme. We first train a weight-sharing supernet and then jointly train fusion modules and gates. Evaluation of the video detection task with the modern EfficientDet reveals the effectiveness of our approach.

Related Material


[pdf]
[bibtex]
@InProceedings{Liang_2022_CVPR, author = {Liang, Feng and Chin, Ting-Wu and Zhou, Yang and Marculescu, Diana}, title = {ANT: Adapt Network Across Time for Efficient Video Processing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {2603-2608} }