SurGen-Net: A Generative Approach for Surgical VQA with Structured Text Generation

Yongjun Jeon, Seonmin Park, Jongmin Shin, Kanggil Park, Bogeun Kim, Namkee Oh, Kyu-Hwan Jung; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 1303-1310

Abstract


Existing discriminative approaches in surgical Visual Question Answering (VQA) exhibit key limitations, including susceptibility to data distribution bias, overfitting, ineffective utilization of raw text supervision, and a lack of deep cross-modal understanding. Additionally, their reliance on fixed answer sets makes them impractical for real-world clinical applications. To address these challenges, we propose SurGen-Net, a generative model designed to enhance multimodal learning and contextual reasoning in surgical VQA. Unlike conventional models that treat question-answer pairs independently, our model is trained to generate a structured format, allowing it to integrate all question-answer interactions and develop a more comprehensive understanding of surgical scenes. SurGen-Net comprises a Surgical Vision Encoder and a Surgical Captioner, utilizing raw text supervision and an advanced multimodal fusion mechanism to construct rich textual representations of surgical environments. Evaluation on the PitVQA dataset demonstrates consistent performance gains over existing models, particularly in Instruments and Position categories, highlighting its ability to enhance surgical tool recognition and spatial reasoning. The implementation code and the newly structured dataset format are available at https://github.com/yongyong98/Surgen-Net.git.

Related Material


[pdf]
[bibtex]
@InProceedings{Jeon_2025_ICCV, author = {Jeon, Yongjun and Park, Seonmin and Shin, Jongmin and Park, Kanggil and Kim, Bogeun and Oh, Namkee and Jung, Kyu-Hwan}, title = {SurGen-Net: A Generative Approach for Surgical VQA with Structured Text Generation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {1303-1310} }