Joint Prediction of Activity Labels and Starting Times in Untrimmed Videos

Tahmida Mahmud, Mahmudul Hasan, Amit K. Roy-Chowdhury; The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5773-5782

Abstract


Most of the existing works on human activity analysis focus on recognition or early recognition of the activity labels from complete or partial observations. Predicting the labels of future unobserved activities where no frames of the predicted activities have been observed is a challenging problem, with important applications, which has not been explored much. Associated with the future label prediction problem is the problem of predicting the starting time of the next activity. In this work, we propose a system that is able to infer about the labels and the starting times of future activities. Activities are characterized by the previous activity sequence (which is observed), as well as the objects present in the scene during their occurrence. We propose a network similar to a hybrid Siamese network with three branches to jointly learn both the future label and the starting time. The first branch takes visual features from the objects present in the scene using a fully connected network, the second branch takes previous activity features using a LSTM network to model long-term sequential relationships and the third branch captures the last observed activity features to model the context of inter-activity time using another fully connected network. These concatenated features are used for both label and time prediction. Experiments on two challenging datasets demonstrate that our framework for joint prediction of activity label and starting time improves the performance of both, and outperforms the state-of-the-arts.

Related Material


[pdf] [Supp]
[bibtex]
@InProceedings{Mahmud_2017_ICCV,
author = {Mahmud, Tahmida and Hasan, Mahmudul and Roy-Chowdhury, Amit K.},
title = {Joint Prediction of Activity Labels and Starting Times in Untrimmed Videos},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}