Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling

Cho, Choongyeun; Antin, Benjamin; Arora, Sanchit; Ashrafi, Shwan; Duan, Peilin; The Huynh, Dang; James, Lee; Tuan Nguyen, Hang; Solgi, Mojtaba; Van Than, Cuong

Choongyeun Cho, Benjamin Antin, Sanchit Arora, Shwan Ashrafi, Peilin Duan, Dang The Huynh, Lee James, Hang Tuan Nguyen, Mojtaba Solgi, Cuong Van Than; Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0-0

Abstract

This paper presents the Axon AI’s solution to the 2nd YouTube8M Video Understanding Challenge, achieving the final global average precision (GAP) of 88.733% on the private test set (ranked 3rd among 394 teams, not considering the model size constraint), and 87.287% using a model that meets size requirement. Two sets of 7 individual models belonging to 3 different families were trained separately. Then, the inference results on a training data were aggregated from these multiple models and fed to train a compact model that meets the model size requirement. In order to further improve performance we explored and employed data over/sub-sampling in feature space, an additional regularization term during training exploiting label relationship, and learned weights for ensembling different individual models.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Cho_2018_ECCV_Workshops,
author = {Cho, Choongyeun and Antin, Benjamin and Arora, Sanchit and Ashrafi, Shwan and Duan, Peilin and The Huynh, Dang and James, Lee and Tuan Nguyen, Hang and Solgi, Mojtaba and Van Than, Cuong},
title = {Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
month = {September},
year = {2018}
}