SimGlim: Simplifying Glimpse Based Active Visual Reconstruction

Abhishek Jha, Soroush Seifi, Tinne Tuytelaars; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 269-278

Abstract


An agent with a limited field of view needs to sample the most informative local observations of an environment in order to model the global context. Current works train this selection strategy by defining a complex architecture built upon features learned through convolutional encoders. In this paper, we first discuss why vision transformers are better suited than CNNs for such an agent. Next, we propose a simple transformer based active visual sampling model, called "SimGlim", which utilises transformer's inherent self-attention architecture to sequentially predict the best next location based on the current observable environment. We show the efficacy of our proposed method on the task of image reconstruction in the partial observable setting and compare our model against existing state-of-the-art active visual reconstruction methods. Finally, we provide ablations for the parameters of our design choice to understand their importance in the overall architecture.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Jha_2023_WACV, author = {Jha, Abhishek and Seifi, Soroush and Tuytelaars, Tinne}, title = {SimGlim: Simplifying Glimpse Based Active Visual Reconstruction}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {269-278} }