-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Zhen_2026_CVPR, author = {Zhen, Yihao and Xu, Mingyue and Wang, Qiang and Fan, Baojie and Dong, Jiahua and Zhao, Tinghui and Fan, Huijie}, title = {GMT: Effective Global Framework for Multi-Camera Multi-Target Tracking}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {28201-28210} }
GMT: Effective Global Framework for Multi-Camera Multi-Target Tracking
Abstract
Existing Multi-Camera Multi-Target (MCMT) tracking models typically adopt a two-stage framework, involving single-camera tracking followed by inter-camera tracking. However, in this paradigm, the use of multiple views is confined to recovering missed matches in the first stage, providing a limited contribution to overall tracking. To address this issue, we propose a novel global MCMT tracking framework termed GMT, which effectively leverages the advantage of multi-view by performing global-level trajectory-target matching. Specifically, instead of assigning trajectories independently for each view, we propose a Cross-View Feature Consistency Enhancement(CFCE) module to reduce the feature discrepancies across different views, and encode the same historical targets across different views as global trajectories. The Global Trajectory Associate (GTA) module is then introduced to associate new targets to global trajectories, allowing the model to jointly exploit both intra-view and inter-view cues during tracking. Compared with the two-stage framework, the GMT achieves significant improvements on existing datasets, with gains of up to 13.1% in CVMA in and 19.2% in CVIDF1. Moreover, we present VisionTrack, a high-quality, large-scale MCMT dataset encompassing diverse scenes with varying illumination and target distributions, providing significantly greater diversity than existing datasets. Our code and dataset will be released at https://github.com/FoxCanned/GMT.
Related Material

