Generative Point Tracking and Forecasting

Xuanchen Lu, Ang Cao, Chao Feng, Andrew Owens; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 28167-28178

Abstract


Motion forecasting predicts where points will move in the future, while motion tracking predicts where they are in the present. Despite these similarities, existing approaches to the two problems are quite different. In this paper, we propose a unified model that can address both tasks. We train a causal, video-conditioned flow matching model to predict point positions. The resulting model can easily toggle between point tracking to forecasting by changing its visual signal. Despite our model's simplicity, we find that it outperforms prior work in point forecasting and obtains performance that is competitive with the state-of-the-art on the TAP-Vid benchmark.

Related Material


[pdf]
[bibtex]
@InProceedings{Lu_2026_CVPR, author = {Lu, Xuanchen and Cao, Ang and Feng, Chao and Owens, Andrew}, title = {Generative Point Tracking and Forecasting}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {28167-28178} }