MV-TAP: Tracking Any Point in Multi-View Videos

Jahyeok Koo, Inès Hyeonsu Kim, Mungyeom Kim, Junghyun Park, Seohyeon Park, Jaeyeong Kim, Jung Yi, Seokju Cho, Seungryong Kim; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 20932-20941

Abstract


Multi-view camera systems enable rich observations of complex real-world scenes, and understanding dynamic objects in multi-view settings has become central to various applications. Point tracking serves as a key mechanism for capturing dynamic motion. However, conventional single-view approaches often fail due to the limited geometric information available in monocular video, which becomes a critical bottleneck for multi-view extension. In this work, we present MV-TAP, a robust point tracker that tracks points across multi-view videos of dynamic scenes by leveraging cross-view information. MV-TAP utilizes camera geometry and a cross-view attention mechanism to aggregate spatio-temporal information across views, enabling more complete and reliable trajectory estimation in multi-view videos. To support this task, we construct a large-scale synthetic training dataset and real-world evaluation sets tailored for multi-view tracking. Extensive experiments demonstrate that MV-TAP outperforms existing point-tracking methods on challenging benchmarks, establishing an effective baseline for advancing research in multi-view point tracking.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Koo_2026_CVPR, author = {Koo, Jahyeok and Kim, In\`es Hyeonsu and Kim, Mungyeom and Park, Junghyun and Park, Seohyeon and Kim, Jaeyeong and Yi, Jung and Cho, Seokju and Kim, Seungryong}, title = {MV-TAP: Tracking Any Point in Multi-View Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {20932-20941} }