Predicting the What and How - a Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling

Judith Butepage, Hedvig Kjellstrom, Danica Kragic; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0-0

Abstract


Video-based prediction of human activity is usually performed on one of two levels: either a model is trained to anticipate high-level action labels or it is trained to predict future trajectories either in skeletal joint space or in image pixel space. This separation of classification and regression tasks implies that models cannot make use of the mutual information between continuous and semantic observations. However, if a model knew that an observed human wants to drink from a nearby glass, the space of possible trajectories would be highly constrained to reaching movements. Likewise, if a model had predicted a reaching trajectory, the inference of future semantic labels would rank "lifting" more likely than "walking". In this work, we propose a semi-supervised generative latent variable model that addresses both of these levels by modeling continuous observations as well as semantic labels. This fusion of signals allows the model to solve several tasks, such as action detection and anticipation as well as motion prediction and synthesis, simultaneously. We demonstrate this ability on the UTKinect-Action3D dataset, which consists of noisy, partially labeled multi-action sequences. The aim of this work is to encourage research within the field of human activity modeling based on mixed categorical and continuous data.

Related Material


[pdf]
[bibtex]
@InProceedings{Butepage_2019_CVPR_Workshops,
author = {Butepage, Judith and Kjellstrom, Hedvig and Kragic, Danica},
title = {Predicting the What and How - a Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}