Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion

Liang, Quanmin; Li, Qiang; Liu, Shuai; Cao, Xinzi; Lu, Jinyi; Yang, Feidiao; Zhang, Wei; Huang, Kai; Tian, Yonghong

Quanmin Liang, Qiang Li, Shuai Liu, Xinzi Cao, Jinyi Lu, Feidiao Yang, Wei Zhang, Kai Huang, Yonghong Tian; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 8656-8667

Abstract

Applying pretraining-finetuning paradigm to event cameras presents significant challenges due to the scarcity of large-scale event datasets and the inherently sparse nature of event data, which increases the risk of overfitting during extensive pretraining.In this paper, we explore the transfer of pretrained image knowledge to the domain of event cameras to address this challenge. The key to our approach lies in adapting event data representations to align with image pretrained models while simultaneously integrating spatiotemporal information and mitigating data sparsity. To achieve this, we propose a lightweight SpatioTemporal information fusion Prompting (STP) method, which progressively fuses the spatiotemporal characteristics of event data through a dynamic perception module with multi-scale spatiotemporal receptive fields, enabling compatibility with image pretrained models.STP enhances event data representation by capturing local information within a large receptive field and performing global information exchange along the temporal dimension. This strategy effectively reduces sparse regions in event data while refining fine-grained details, all while preserving its inherent spatiotemporal structure. Our method significantly outperforms previous state-of-the-art approaches across classification, semantic segmentation, and optical flow estimation tasks. For instance, it achieves a top-1 accuracy of 68.87% (+4.04%) on N-ImageNet with only 1/10 of the pretraining parameters and 1/3 of the training epochs.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Liang_2025_ICCV, author = {Liang, Quanmin and Li, Qiang and Liu, Shuai and Cao, Xinzi and Lu, Jinyi and Yang, Feidiao and Zhang, Wei and Huang, Kai and Tian, Yonghong}, title = {Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {8656-8667} }