Sharing Decoders: Network Fission for Multi-Task Pixel Prediction

Steven Hickson, Karthik Raveendran, Irfan Essa; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 3771-3780

Abstract


We examine the benefits of splitting encoder-decoders for multitask learning and showcase results on three tasks (semantics, surface normals, and depth) while adding very few FLOPS per task. Current hard parameter sharing methods for multi-task pixel-wise labeling use one shared encoder with separate decoders for each task. We generalize this notion and term the splitting of encoder-decoder architectures at different points as fission. Our ablation studies on fission show that sharing most of the decoder layers in multi-task encoder-decoder networks results in improvement while adding far fewer parameters per task. Our proposed method trains faster, uses less memory, results in better accuracy, and uses significantly fewer floating point operations (FLOPS) than conventional multi-task methods, with additional tasks only requiring 0.017% more FLOPS than the single-task network. We show results with a real-time model on a Pixel phone with released source code.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Hickson_2022_WACV, author = {Hickson, Steven and Raveendran, Karthik and Essa, Irfan}, title = {Sharing Decoders: Network Fission for Multi-Task Pixel Prediction}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2022}, pages = {3771-3780} }