Temporal Hallucinating for Action Recognition With Few Still Images

Yali Wang, Lei Zhou, Yu Qiao; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5314-5322


Action recognition in still images has been recently promoted by deep learning. However, the success of these deep models heavily depends on huge amount of training images for various action categories, which may not be available in practice. Alternatively, humans can classify new action categories after seeing few images, since we may not only compare appearance similarities between images on hand, but also attempt to recall importance motion cues from relevant action videos in our memory. To mimic this capacity, we propose a novel Hybrid Video Memory (HVM) machine, which can hallucinate temporal features of still images from video memory, in order to boost action recognition with few still images. First, we design a temporal memory module consisting of temporal hallucinating and predicting. Temporal hallucinating can generate temporal features of still images in an unsupervised manner. Hence, it can be flexibly used in realistic scenarios, where image and video categories may not be consistent. Temporal predicting can effectively infer action categories for query image, by integrating temporal features of training images and videos within a domain-adaptation manner. Second, we design a spatial memory module for spatial predicting. As spatial and temporal features are complementary to represent different actions, we apply spatial-temporal prediction fusion to further boost performance. Finally, we design a video selection module to select strongly-relevant videos as memory. In this case, we can balance the number of images and videos to reduce prediction bias as well as preserve computation efficiency. To show the effectiveness, we conduct extensive experiments on three challenging data sets, where our HVM outperforms a number of recent approaches by temporal hallucinating from video memory.

Related Material

[pdf] [Supp]
author = {Wang, Yali and Zhou, Lei and Qiao, Yu},
title = {Temporal Hallucinating for Action Recognition With Few Still Images},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}