Online Gaussian Test-Time Adaptation of Vision-Language Models.

Fuchs, Clément; Zanella, Maxime; De Vleeschouwer, Christophe

Clément Fuchs, Maxime Zanella, Christophe De Vleeschouwer; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 128-137

Abstract

Online test-time adaptation (OTTA) of vision-language models (VLMs) has recently garnered increased attention to take advantage of data observed along a stream to improve future predictions. Unfortunately, existing methods rely on dataset-specific hyper-parameters and incomplete evaluation protocols, limiting their generalization to new tasks. Thus, we propose Online Gaussian Adaptation (OGA), a novel method that models the likelihoods of visual features using Gaussian distributions and incorporates zero-shot priors into a concise Maximum A Posteriori (MAP) estimation framework with fixed hyper-parameters across all datasets. To further extend OTTA methods deployment capabilities, we show that combining OTTA with popular few-shot techniques--a practical yet overlooked setting in prior research--is highly beneficial. Besides, our experimental study reveals that common OTTA evaluation protocols, which average performance over at most three runs per dataset, are inadequate due to the substantial variability observed across runs. Hence, we advocate for more rigorous evaluation practices, including increasing the number of runs and considering additional quantitative metrics, such as our proposed Expected Tail Accuracy (ETA), calculated as the average accuracy in the worst 10 percent of runs. We hope these contributions will encourage more rigorous evaluation practices in the OTTA community, which we believe to be an essential step for real-world deployment in multimodal applications. Code is available at https://github.com/cfuchs2023/OGA.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Fuchs_2025_CVPR, author = {Fuchs, Cl\'ement and Zanella, Maxime and De Vleeschouwer, Christophe}, title = {Online Gaussian Test-Time Adaptation of Vision-Language Models.}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {128-137} }