Scene Adaptive Sparse Transformer for Event-based Object Detection

Yansong Peng, Hebei Li, Yueyi Zhang, Xiaoyan Sun, Feng Wu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16794-16804

Abstract


While recent Transformer-based approaches have shown impressive performances on event-based object detection tasks their high computational costs still diminish the low power consumption advantage of event cameras. Image-based works attempt to reduce these costs by introducing sparse Transformers. However they display inadequate sparsity and adaptability when applied to event-based object detection since these approaches cannot balance the fine granularity of token-level sparsification and the efficiency of window-based Transformers leading to reduced performance and efficiency. Furthermore they lack scene-specific sparsity optimization resulting in information loss and a lower recall rate. To overcome these limitations we propose the Scene Adaptive Sparse Transformer (SAST). SAST enables window-token co-sparsification significantly enhancing fault tolerance and reducing computational overhead. Leveraging the innovative scoring and selection modules along with the Masked Sparse Window Self-Attention SAST showcases remarkable scene-aware adaptability: It focuses only on important objects and dynamically optimizes sparsity level according to scene complexity maintaining a remarkable balance between performance and computational cost. The evaluation results show that SAST outperforms all other dense and sparse networks in both performance and efficiency on two large-scale event-based object detection datasets (1Mpx and Gen1). Code: https://github.com/Peterande/SAST

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Peng_2024_CVPR, author = {Peng, Yansong and Li, Hebei and Zhang, Yueyi and Sun, Xiaoyan and Wu, Feng}, title = {Scene Adaptive Sparse Transformer for Event-based Object Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {16794-16804} }