Modeling Cross-Modal interaction in a Multi-detector, Multi-modal Tracking Framework

Yiqi Zhong, Suya You, Ulrich Neumann; Proceedings of the Asian Conference on Computer Vision (ACCV), 2020

Abstract


Different modalities have their own advantages and disadvantages. In a tracking-by-detection framework, fusing data from multiple modalities would ideally improve tracking performance than using a single modality, but this is a challenge. This study builds upon previous research in this area. We propose a deep-learning based tracking-by-detection pipeline that uses multiple detectors and multiple sensors. For the input, we associate object proposals from 2D and 3D detectors. Through a cross-modal attention module, we optimize interaction between the 2D RGB and 3D point clouds features of each proposal. This helps to generate 2D features with suppressed irrelevant information for boosting performance. Through experiments on a published benchmark, we prove the value and ability of our design in introducing a multi-modal tracking solution to the current research on Multi-Object Tracking (MOT).

Related Material


[pdf]
[bibtex]
@InProceedings{Zhong_2020_ACCV, author = {Zhong, Yiqi and You, Suya and Neumann, Ulrich}, title = {Modeling Cross-Modal interaction in a Multi-detector, Multi-modal Tracking Framework}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {November}, year = {2020} }