-
[pdf]
[bibtex]@InProceedings{D_2025_CVPR, author = {D, Manjunath and Madhu, Shrikar and Sikdar, Aniruddh and Sundaram, Suresh}, title = {VISTA-CLIP: Visual Incremental Self-Tuned Adaptation for Efficient Continual Panoptic Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {6566-6574} }
VISTA-CLIP: Visual Incremental Self-Tuned Adaptation for Efficient Continual Panoptic Segmentation
Abstract
Continual learning in artificial neural networks aims to emulate human-like adaptability by incrementally acquiring new knowledge while retaining prior information. While deep neural networks (DNNs) excel in static tasks, they suffer from catastrophic forgetting-a critical loss of performance on earlier classes when learning new ones. Current research predominantly addresses classification, detection, and semantic segmentation, yet universal continual learning frameworks for panoptic segmentation remain under explored. Traditional methods often rely on computationally intensive knowledge distillation, which scales poorly due to high parameter counts. To bridge this gap, we propose Visual Incremental Self-Tuned Adaptation for Efficient Continual Panoptic Segmentation (VISTA), a novel architecture designed to balance plasticity and stability. Our approach introduces two key innovations: (1) a learnable feature perturbation module that enhances the feature space generalizability through controlled noise injection, (2) visual prompt tuning applied directly to input images, dynamically adapting the model to new classes without altering core parameters, and (3) textual feature addition to prompt embedding to enhance plasticity of the network. Crucially, VISTA freezes the backbone and decoders during incremental training, optimizing only the perturbation generators and prompts to minimize catastrophic forgetting. Extensive experiments demonstrate VISTA's superiority, achieving an average improvement of 1.2% across all tasks and a 2.19% gain in the challenging 50-50 setting (50 base classes followed by 50 incremental classes) compared to state-of-the-art methods. These results establish VISTA as a scalable and efficient paradigm for continual panoptic segmentation, advancing the practicality of lifelong learning systems in dynamic real-world environments.
Related Material