Learning to Select Views for Efficient Multi-View Understanding

Yunzhong Hou, Stephen Gould, Liang Zheng; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 20135-20144

Abstract


Multiple camera view (multi-view) setups have proven useful in many computer vision applications. However the high computational cost associated with multiple views creates a significant challenge for end devices with limited computational resources. In modern CPU pipelining breaks a longer job into steps and enables parallelism over sequential steps from multiple jobs. Inspired by this we study selective view pipelining for efficient multi-view understanding which breaks computation of multiple views into steps and only computes the most helpful views/steps in a parallel manner for the best efficiency. To this end we use reinforcement learning to learn a very light view selection module that analyzes the target object or scenario from initial views and selects the next-best-view for recognition or detection for pipeline computation. Experimental results on multi-view classification and detection tasks show that our approach achieves promising performance while using only 2 or 3 out of N available views significantly reducing computational costs while maintaining parallelism over GPU through selective view pipelining.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Hou_2024_CVPR, author = {Hou, Yunzhong and Gould, Stephen and Zheng, Liang}, title = {Learning to Select Views for Efficient Multi-View Understanding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {20135-20144} }