Learning Video Features for Multi-Label Classification

Shivam Garg; Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0-0


This paper studies some approaches to learn representation of videos. This work was done as a part of Youtube-8M Video Understanding Challenge. The main focus is to analyze various approaches used to model temporal data and evaluate the performance of such approaches on this problem. Also, a model is proposed which reduces the size of feature vector by 70% but does not compromise on accuracy. The first approach is to use recurrent neural network architectures to learn a single video level feature from frame level features and then use this aggregated feature to do multi-label classification. The second approach is to use video level features and deep neural networks to assign the labels.

Related Material

author = {Garg, Shivam},
title = {Learning Video Features for Multi-Label Classification},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
month = {September},
year = {2018}