Multi-Task Learning with Knowledge Distillation for Dense Prediction

Yangyang Xu, Yibo Yang, Lefei Zhang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 21550-21559

Abstract


While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multi-task model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intra-task information between the teacher model and the student model of each task, thereby learning more "dark knowledge" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation, surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency.

Related Material


[pdf]
[bibtex]
@InProceedings{Xu_2023_ICCV, author = {Xu, Yangyang and Yang, Yibo and Zhang, Lefei}, title = {Multi-Task Learning with Knowledge Distillation for Dense Prediction}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {21550-21559} }