Contrastive Learning for Multi-Object Tracking With Transformers

Pierre-François De Plaen, Nicola Marinello, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 6867-6877

Abstract


The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{De_Plaen_2024_WACV, author = {De Plaen, Pierre-Fran\c{c}ois and Marinello, Nicola and Proesmans, Marc and Tuytelaars, Tinne and Van Gool, Luc}, title = {Contrastive Learning for Multi-Object Tracking With Transformers}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {6867-6877} }