BoT-FaceSORT: Bag-of-Tricks for Robust Multi-Face Tracking in Unconstrained Videos

Jonghyeon Kim, Chan-Yang Ju, Gun-Woo Kim, Dong-Ho Lee; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 1437-1453

Abstract


Multi-face tracking (MFT) is a subtask of multi-object tracking (MOT) that focuses on detecting and tracking multiple faces across video frames. Modern MOT trackers adopt the Kalman filter (KF), a linear model that estimates current motions based on previous observations. However, these KF-based trackers struggle to predict motions in unconstrained videos with frequent shot changes, occlusions, and appearance variations. To address these limitations, we propose BoT-FaceSORT, a novel MFT framework that integrates shot change detection, shared feature memory, and an adaptive cascade matching strategy for robust tracking. It detects shot changes by comparing the color histograms of adjacent frames and resets KF states to handle discontinuities. Additionally, we introduce MovieShot, a new benchmark of challenging movie clips to evaluate MFT performance in unconstrained scenarios. We also demonstrate the superior performance of our method compared to existing methods on three benchmarks, while an ablation study validates the effectiveness of each component in handling unconstrained videos.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Kim_2024_ACCV, author = {Kim, Jonghyeon and Ju, Chan-Yang and Kim, Gun-Woo and Lee, Dong-Ho}, title = {BoT-FaceSORT: Bag-of-Tricks for Robust Multi-Face Tracking in Unconstrained Videos}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {1437-1453} }