A Hardware Prototype Targeting Distributed Deep Learning for On-Device Inference

Allen-Jasmin Farcas, Guihong Li, Kartikeya Bhardwaj, Radu Marculescu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 398-399

Abstract


This paper presents a hardware prototype and a framework for a new communication-aware model compression for distributed on-device inference. Our approach relies on Knowledge Distillation (KD) and achieves orders of magnitude compression ratios on a large pre-trained teacher model. The distributed hardware prototype consists of multiple student models deployed on Raspberry-Pi 3 nodes that run Wide ResNet and VGG models on the CIFAR10 dataset for real-time image classification. We observe significant reductions in memory footprint (50x), energy consumption (14x), latency (33x) and an increase in performance (12x) without any significant accuracy loss compared to the initial teacher model. This is an important step towards deploying deep learning models for IoT applications.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Farcas_2020_CVPR_Workshops,
author = {Farcas, Allen-Jasmin and Li, Guihong and Bhardwaj, Kartikeya and Marculescu, Radu},
title = {A Hardware Prototype Targeting Distributed Deep Learning for On-Device Inference},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}
}