Multimodal Neuromorphic Event-Frame Fusion in Domain-Generalized Vision Transformer for Dynamic Object Tracking

Taha Razzaq, Asim Iqbal; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 4684-4691

Abstract


Object tracking is a fundamental task in computer vision with critical applications in autonomous driving, surveillance, and robotics. However, existing tracking solutions struggle in high-speed, real-time scenarios due to their reliance on conventional frame-based sensors optimized for low-frame-rate environments. Neuromorphic event-driven sensors offer a compute-efficient alternative by capturing continuous, asynchronous intensity changes, excelling in fast-motion detection. However, neuromorphic vision sensors have a lower spatial resolution, limiting their ability to capture fine textures crucial for object identification. Multimodal fusion techniques have been explored recently to leverage the complementary strengths of both frames+events modalities, incorporating optical flow estimation, motion compensation, and deformable convolutions. While these fusion models improve performance under rapid motion, they remain susceptible to domain shifts, leading to degradation when tested on out-of-distribution "unseen" target data. To address this challenge, we introduce an application of neuro-inspired, domain-generalized Winner-Take-All (WTA) mathematical layer that seamlessly integrates into the Vision Transformer (ViT) architecture. Our approach enhances domain invariance in object detection and tracking systems, particularly in environments with diverse lighting conditions and visual variations. We demonstrate that our technique significantly improves ViT performance for image classification, even when trained on a limited dataset. Additionally, we propose a multimodal AI framework that enables real-time object detection through the fusion of frame+event-based data.

Related Material


[pdf]
[bibtex]
@InProceedings{Razzaq_2025_ICCV, author = {Razzaq, Taha and Iqbal, Asim}, title = {Multimodal Neuromorphic Event-Frame Fusion in Domain-Generalized Vision Transformer for Dynamic Object Tracking}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {4684-4691} }