Fast Object Detection in High-Resolution Videos

Ryan Tran, Atul Kanaujia, Vasu Parameswaran; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 1469-1478

Abstract


Despite the rapid evolution of video resolutions and progress on object detection algorithms, processing high resolution videos has had three main challenges so far. Firstly, it is non-trivial to use existing tracking algorithms to extend an object detection framework for efficient processing of high resolution videos. In theory, fully convolutional CNN architectures in most existing deep learning models allow any input resolution to be processed. However, in practice, inferencing on high resolution images decoded from a video incurs significant computational costs, making it impractical for real-time applications. Secondly, most tracking approaches typically require the entire frame to be decoded. Relatively little work has gone into object detection directly on compressed data, which include rich temporal cues that can be exploited to reduce the computational cost at inference time. Thirdly, most of these approaches require labeled data for training models, thereby limiting their adoption. We tackle all the three challenges in our framework by incorporating forward and backward motion cues from the compressed video to dramatically increase the processing speed of a pretrained baseline object detector, without any loss of accuracy. Our training is based on knowledge transfer from the baseline detector as a teacher network, thereby forgoing the need for any labeled data. Finally, the models are agnostic to teacher network architecture, and can be used to improve efficiency of any object detector. Our results show a speed gain of 3x to 20x compared to a frame-by-frame detector, depending upon input data resolution.

Related Material


[pdf]
[bibtex]
@InProceedings{Tran_2023_ICCV, author = {Tran, Ryan and Kanaujia, Atul and Parameswaran, Vasu}, title = {Fast Object Detection in High-Resolution Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {1469-1478} }