CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

Yao Zhang, Haokun Chen, Ahmed Frikha, Denis Krompass, Gengyuan Zhang, Jindong Gu, Volker Tresp; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 6269-6278

Abstract


Visual Question Answering (VQA) systems witnessed a significant advance in recent years due to the development of large-scale Vision-Language Pre-trained Models (VLPMs). As the application scenario and user demand change over time an advanced VQA system is expected to be capable of continuously expanding its knowledge and capabilities over time not only to handle new tasks (i.e. new question types or visual scenes) but also to answer questions in new specialized domains without forgetting previously acquired knowledge and skills. Existing works studying CL on VQA tasks primarily consider answer- and question-type incremental learning or scene- and function-incremental learning whereas how VQA systems perform when they encounter new domains and increasing user demands has not been studied. Motivated by this we introduce CL-CrossVQA a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering through which we conduct extensive experiments on 4 VLPMs 5 CL approaches and 5 VQA datasets from different domains. In addition by probing the forgetting phenomenon of the intermediate layers we provide insights into how model architecture affects CL performance why CL approaches can help mitigate forgetting in VLPMs and how to design CL approaches suitable for VLPMs in this challenging continual learning environment. To facilitate future work on developing an advanced All-in-One VQA system we will release our datasets and code.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2025_WACV, author = {Zhang, Yao and Chen, Haokun and Frikha, Ahmed and Krompass, Denis and Zhang, Gengyuan and Gu, Jindong and Tresp, Volker}, title = {CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {6269-6278} }