Interactive Labeling for Human Pose Estimation in Surveillance Videos

Mickael Cormier, Fabian Röpke, Thomas Golda, Jürgen Beyerer; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 1649-1658


Automatically detecting and estimating the movement of persons in real-world uncooperative scenarios is very challenging in great part due to limited and unreliably annotated data. For instance annotating a single human body pose for activity recognition requires 40-60 seconds in complex sequences, leading to long-winded and costly annotation processes. Therefore increasing the sizes of annotated datasets through crowdsourcing or automated annotation is often used at a great financial costs, without reliable validation processes and inadequate annotation tools greatly impacting the annotation quality. In this work we combine multiple techniques into a single web-based general-purpose annotation application. Pre-trained machine learning models enable annotators to interactively detect pedestrians, re-identify them throughout the sequence, estimate their poses, and correct annotation suggestions in the same interface. Annotations are then inter- and extrapolated between frames. The application is evaluated through several user studies and the results are extensively analyzed. Experiments demonstrate a 55% reduction in annotation time for less complex scenarios while simultaneously decreasing perceived annotator workload.

Related Material

[pdf] [supp]
@InProceedings{Cormier_2021_ICCV, author = {Cormier, Mickael and R\"opke, Fabian and Golda, Thomas and Beyerer, J\"urgen}, title = {Interactive Labeling for Human Pose Estimation in Surveillance Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {1649-1658} }