LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Images

Chetan Madan, Mayuna Gupta, Soumen Basu, Pankaj Gupta, Chetan Arora; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 557-567

Abstract


We focus on the problem of Gallbladder Cancer (GBC) detection from Ultrasound (US) images. The problem presents unique challenges to modern Deep Neural Network (DNN) techniques due to low image quality arising from noise textures and viewpoint variations. Tackling such challenges would necessitate precise localization performance by the DNN to identify the discerning features for the downstream malignancy prediction. While several techniques have been proposed in the recent years for the problem all of these methods employ complex custom architectures. Inspired by the success of foundational models for natural image tasks along with the use of adapters to fine-tune such models for the custom tasks we investigate the merit of one such design ViT-Adapter for the GBC detection problem. We observe that ViT-Adapter relies predominantly on a primitive CNN-based spatial prior module to inject the localization information via cross-attention which is inefficient for our problem due to the small pathology sizes and variability in their appearances due to non-regular structure of the malignancy. In response we propose LQ-Adapter a modified Adapter design for ViT which improves localization information by leveraging learnable content queries over the basic spatial prior module. Our method surpasses existing approaches enhancing the mean IoU (mIoU) scores by 5.4% 5.8% and 2.7% over ViT-Adapters DINO and FocalNet-DINO respectively on the US image-based GBC detection dataset and establishing a new state-of-the-art (SOTA). Additionally we validate the applicability and effectiveness of LQ-Adapter on the Kvasir-Seg dataset for polyp detection from colonoscopy images. Superior performance of our design on this problem as well showcases its capability to handle diverse medical imaging tasks across different datasets. Source code and trained models are publicly released.

Related Material


[pdf]
[bibtex]
@InProceedings{Madan_2025_WACV, author = {Madan, Chetan and Gupta, Mayuna and Basu, Soumen and Gupta, Pankaj and Arora, Chetan}, title = {LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Images}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {557-567} }