Compact and Efficient Multitask Learning in Vision, Language and Speech

Al-Rawi, Mohammed; Valveny, Ernest

Mohammed Al-Rawi, Ernest Valveny; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0

Abstract

Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.

Related Material

[pdf]

[bibtex]

@InProceedings{Al-Rawi_2019_ICCV,
author = {Al-Rawi, Mohammed and Valveny, Ernest},
title = {Compact and Efficient Multitask Learning in Vision, Language and Speech},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}
}