LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association

Wang, Peng; Wang, Yongcai; Cao, Hualong; Chen, Wang; Li, Deying

Peng Wang, Yongcai Wang, Hualong Cao, Wang Chen, Deying Li; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 12438-12448

Abstract

This paper proposes LA-MOTR, a novel Tracking-by-Learnable-Association framework that resolves the competing optimization objectives between detection and association in end-to-end Tracking-by-Attention (TbA) Multi-Object Tracking. Current TbA methods employ shared decoders for simultaneous object detection and tracklet association, often resulting in task interference and suboptimal accuracy. By contrast, our end-to-end framework decouples these tasks into two specialized modules: Separated Object-Tracklet Detection (SOTD) and Spatial-Guided Learnable Association (SGLA). This decoupled design offers flexibility and explainability. In particular, SOTD independently detects new objects and existing tracklets in each frame, while SGLA associates them via Spatial-Weighted Learnable Attention module guided by relative spatial cues. Temporal coherence is further maintained through Tracklet Updates Module. The learnable association mechanism resolves the inherent suboptimal association issues in decoupled frameworks, avoiding the task interference commonly observed in joint approaches. Evaluations on DanceTrack, MOT17, and SportMOT datasets demonstrate state-of-the-art performance. Extensive ablation studies validate the effectiveness of the designed modules. Code is available at https://github.com/PenK1nG/LA-MOTR.

Related Material

[pdf]

[bibtex]

@InProceedings{Wang_2025_ICCV, author = {Wang, Peng and Wang, Yongcai and Cao, Hualong and Chen, Wang and Li, Deying}, title = {LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {12438-12448} }