MILA: Multi-Task Learning From Videos via Efficient Inter-Frame Attention

Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gérard Medioni; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 2219-2229

Abstract


Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a "slow-fast" architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate. We also propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features. Our approach ensures low-latency multi-task learning while maintaining high quality predictions. Experiments show competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by 70%. In addition, our attention based feature propagation method (ILA) outperforms prior work in terms of task accuracy while also reducing up to 90% of FLOPs.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Kim_2021_ICCV, author = {Kim, Donghyun and Lan, Tian and Zou, Chuhang and Xu, Ning and Plummer, Bryan A. and Sclaroff, Stan and Eledath, Jayan and Medioni, G\'erard}, title = {MILA: Multi-Task Learning From Videos via Efficient Inter-Frame Attention}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {2219-2229} }