-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Varghese_2025_ICCV, author = {Varghese, Subin and Gao, Joshua and Hoskere, Vedhus}, title = {ViewDelta: Scaling Scene Change Detection through Text-Conditioning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2818-2828} }
ViewDelta: Scaling Scene Change Detection through Text-Conditioning
Abstract
We introduce a generalized framework for Scene Change Detection (SCD) that tackles the ambiguity of distinguishing "relevant" from "nuisance" changes, letting a single model train jointly across diverse domains and applications. Existing methods fail to generalize because dataset labels disagree; for example, vegetation growth or lane-marking alterations may be labeled relevant in one dataset and irrelevant in another. To overcome this, we propose ViewDelta, a text-conditioned change-detection framework that uses natural-language prompts to specify relevant changes precisely, whether a single attribute, a selected set of classes, or all observable differences. To enable this paradigm, we release the Conditional Change Segmentation dataset (CSeg), the first large-scale synthetic dataset for text-conditioned SCD, containing more than 500,000 image pairs with over 300,000 unique textual prompts. Experiments show that a single ViewDelta model jointly trained on CSeg, SYSU-CD, PSCD, VL-CMU-CD, and their unaligned variants matches or surpasses models trained separately for each dataset, demonstrating that text conditioning enables generalizable SCD.
Related Material
