ViT-Koop: Vision-Transformer-Koopman Operators for Efficient Time-Series Forecasting of Earth-Observation Data

Shinohara, Takayuki

Takayuki Shinohara; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 2835-2844

Abstract

Transformers can model the complex spatiotemporal dependencies present in satellite imagery, yet their quadratic computational cost limits real time, large scale applications such as climate monitoring and disaster response. We introduce ViTKoop, a lightweight framework that combines a Vision Transformer based autoencoder with a linear Koopman operator. The autoencoder compresses each image sequence into a compact latent state, and the Koopman operator advances this state linearly in time, greatly reducing computational complexity without sacrificing fidelity. On three benchmarks(ENSO, SEVIR, and EarthNet2021), ViTKoop matches or surpasses state of the art Transformer baselines while requiring only a small fraction of their floating point operations. This efficiency enables real time, high resolution forecasting on modest hardware and supports timely weather prediction as well as rapid, energy efficient Earth observation services that are vital for sustainable development.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Shinohara_2025_ICCV, author = {Shinohara, Takayuki}, title = {ViT-Koop: Vision-Transformer-Koopman Operators for Efficient Time-Series Forecasting of Earth-Observation Data}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2835-2844} }