Embedding Task Structure for Action Detection

Michael Peven, Gregory D. Hager; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 6604-6613

Abstract


We present a straightforward, flexible method to enhance the accuracy and quality of action detection by expressing temporal and structural relationships of actions in the loss function of a deep network. We describe ways to represent otherwise implicit structure in video data and demonstrate how these structures reflect natural biases that improve network training. Our experiments show that our approach improves both accuracy and edit-distance of action recognition and detection models over a baseline. Our framework leads to improvements over prior work and obtains state-of-the-art results on multiple benchmarks.

Related Material


[pdf]
[bibtex]
@InProceedings{Peven_2024_WACV, author = {Peven, Michael and Hager, Gregory D.}, title = {Embedding Task Structure for Action Detection}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {6604-6613} }