Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video

Dinesh Jayaraman, Kristen Grauman; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3852-3861

Abstract


How can unlabeled video augment visual learning? Existing methods perform "slow" feature analysis, encouraging temporal coherence, where the image representations of temporally close frames to exhibit only small differences. While this standard approach captures the fact that high-level visual signals change slowly over time, it fails to capture *how* the visual content changes. We propose to generalize slow feature analysis to "steady" feature analysis. The key idea is to impose a prior that higher order derivatives in the learned feature space must be small. To this end, we train a convolutional neural network with a regularizer that minimizes a contrastive loss on tuples of sequential frames from unlabeled video. Focusing on the case of triplets of frames, the proposed method encourages that feature changes over time should be smooth, i.e., similar to the most recent changes. Using five diverse image and video datasets, including unlabeled YouTube and KITTI videos, we demonstrate our method's impact on object recognition, scene classification, and action recognition tasks. We further show that our features learned from unlabeled video can even surpass a standard heavily supervised pretraining approach.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Jayaraman_2016_CVPR,
author = {Jayaraman, Dinesh and Grauman, Kristen},
title = {Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}