-
[pdf]
[supp]
[bibtex]@InProceedings{Athwale_2025_WACV, author = {Athwale, Akshaya and Shili, Ichrak and Bergeron, \'Emile and Ahmad, Ola and Lalonde, Jean-Francois}, title = {DarSwin-Unet: Distortion Aware Architecture}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8659-8669} }
DarSwin-Unet: Distortion Aware Architecture
Abstract
Wide angle fisheye images are becoming increasingly common for perception tasks in applications such as robotics security and mobility (e.g. drones avionics). However current models often either ignore the distortions in wide angle images or are not suitable to perform pixel-level tasks. In this paper we present an encoder-decoder model based on a radial transformer architecture that adapts to distortions in wide angle lenses by leveraging the physical characteristics defined by the radial distortion profile. In contrast to the original model which only performs classification tasks we introduce a U-Net architecture DarSwin-Unet designed for pixel level tasks. Furthermore we propose a novel strategy that minimizes sparsity when sampling the image for creating its input tokens. Our approach enhances the model capability to handle pixel-level tasks in wide angle fisheye images making it more effective for real-world applications. Compared to other baselines DarSwin-Unet achieves the best results across different datasets with significant gains when trained on bounded levels of distortions (very low low medium and high) and tested on all including out-of-distribution distortions. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide angle lenses. The code and models are publicly available at https://lvsn.github.io/darswin-unet/.
Related Material