SiamMOT: Siamese Multi-Object Tracking

Bing Shuai, Andrew Berneshawi, Xinyu Li, Davide Modolo, Joseph Tighe; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12372-12382


In this work, we focus on improving online multi-object tracking (MOT). In particular, we propose a novel region-based Siamese Multi-Object Tracking network, which we name SiamMOT. SiamMOT is based upon Faster-RCNN and adds a forward tracker that models the instance's motion across two frames such that detected instances can be associated in an online fashion. We present two variants of this tracker, an implicit motion model and a novel Siamese-type explicit motion model. We carry out extensive quantitative experiments on three important MOT datasets: MOT17, TAO-person and Caltech Roadside Pedestrians, showing the importance of motion modelling for MOT and the ability of SiamMOT to substantially outperform the state-of-the-art. Finally, SiamMOT also outperforms the winners of ACM MM'20 HiEve Grand Challenge on the Human in Events dataset. Moreover, SiamMOT is efficient, and it runs at 17 FPS for 720P videos on a single modern GPU. We will release SiamMOT source code upon acceptance of this paper.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Shuai_2021_CVPR, author = {Shuai, Bing and Berneshawi, Andrew and Li, Xinyu and Modolo, Davide and Tighe, Joseph}, title = {SiamMOT: Siamese Multi-Object Tracking}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {12372-12382} }