Context in Human Action Through Motion Complementarity

Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 6531-6540


Motivated by Goldman's Theory of Human Action - a framework in which action decomposes into 1) base physical movements, and 2) the context in which they occur - we propose a novel learning formulation for motion and context, where context is derived as the complement to motion. More specifically, we model physical movement through the adoption of Therbligs, a set of elemental physical motions centered around object manipulation. Context is modeled through the use of a contrastive mutual information loss that formulates context information as the action information not contained within movement information. We empirically prove the utility brought by this separation of representation, showing sizable improvements in action recognition and action anticipation accuracies for a variety of models. We present results over two object manipulation datasets: EPIC Kitchens 100, and 50 Salads.

Related Material

@InProceedings{Dessalene_2024_WACV, author = {Dessalene, Eadom and Maynord, Michael and Ferm\"uller, Cornelia and Aloimonos, Yiannis}, title = {Context in Human Action Through Motion Complementarity}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {6531-6540} }