GOVTrack: Towards Generative Open-Vocabulary Multi-Object Tracking

Qian, Zekun; Han, Ruize; Wang, Zhixiang; Wan, Liang; Feng, Wei

Zekun Qian, Ruize Han, Zhixiang Wang, Liang Wan, Wei Feng; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 1872-1882

Abstract

We study a novel yet practical problem of generative open-vocabulary multi-object tracking (GOVMOT), which extends the MOT to localize, associate, and recognize generic-category objects from both seen (base) and unseen (novel) classes as a generative problem. This overcomes the limitations of previous open-set MOT problems, which either fail to classify novel classes or require a predefined list of category texts as prompts.To study this problem, the top priority is to build a benchmark.In this work, we build GOVTrackB, a large-scale and comprehensive benchmark, to provide a standard evaluation platform for the GOVMOT problem. Compared to previous datasets, GOVTrackB has more abundant and balanced base/novel classes, along with corresponding samples for evaluation with less bias.We also propose a new multi-granularity recognition metric to better evaluate the generative object recognition in GOVMOT.We further develop GOVTracker as a baseline method, featuring a consistency-aware focal loss that enhances object association by jointly modeling appearance and semantic consistency.Through extensive benchmark evaluations, we report and analyze the results of various state-of-the-art methods, which demonstrate the rationale of GOVMOT, as well as the usefulness and advantages of GOVTrackB.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Qian_2026_CVPR, author = {Qian, Zekun and Han, Ruize and Wang, Zhixiang and Wan, Liang and Feng, Wei}, title = {GOVTrack: Towards Generative Open-Vocabulary Multi-Object Tracking}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, month = {June}, year = {2026}, pages = {1872-1882} }