Video Scene Parsing With Predictive Feature Learning

Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan; The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5580-5588


Video scene parsing is challenging due to the following two reasons: firstly, it is non-trivial to learn meaningful video representations for producing the temporally consistent labeling map; secondly, such a learning process becomes more difficult with insufficient labeled video training data. In this work, we propose a unified framework to address the above two problems, which is to our knowledge the first model to employ predictive feature learning in the video scene parsing. The predictive feature learning is carried out in two predictive tasks: frame prediction and predictive parsing. It is experimentally proved that the learned predictive features in our model are able to significantly enhance the video parsing performance by combining with the standard image parsing network. Interestingly, the performance gain brought by the predictive learning is almost costless as the features are learned from a large amount of unlabeled video data in an unsupervised way. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our model by showing remarkable improvement over well-established baselines.

Related Material

[pdf] [Supp] [arXiv]
author = {Jin, Xiaojie and Li, Xin and Xiao, Huaxin and Shen, Xiaohui and Lin, Zhe and Yang, Jimei and Chen, Yunpeng and Dong, Jian and Liu, Luoqi and Jie, Zequn and Feng, Jiashi and Yan, Shuicheng},
title = {Video Scene Parsing With Predictive Feature Learning},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}