High-Efficiency Device-Cloud Collaborative Transformer Model
Natural Language Processing (NLP) experts have had significant success with unsupervised language pre-training techniques. However, compared to typical NLP models, modern self-attention models require far more computational and memory resources than conventional NLP models, making pre-training or even fine-tuning them quite costly. It drastically restricts their success and uses in a variety of fields. To improve the efficiency, we propose Device-Cloud Collaborative Transformer for an efficient language model, which is a framework across cloud and device, and is designed to encourage learning of representations that generalize better to many different tasks. Specifically, we design Device-Cloud Collaborative Transformer architecture of large language models that benefits both cloud modeling and device modeling. Experimental results demonstrate the effectiveness of our proposed method.