TinyOps: ImageNet Scale Deep Learning on Microcontrollers

Sulaiman Sadiq, Jonathon Hare, Partha Maji, Simon Craske, Geoff V. Merrett; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 2702-2706


Deep Learning on microcontroller (MCU) based IoT devices is extremely challenging due to memory constraints. Prior approaches focus on using internal memory or external memories exclusively which limit either accuracy or latency. We find that a hybrid method using internal and external MCU memories outperforms both approaches in accuracy and latency. We develop TinyOps, an inference engine which accelerates inference latency of models in slow external memory, using a partitioning and overlaying scheme via the available Direct Memory Access (DMA) peripheral to combine the advantages of external memory (size) and internal memory (speed). Experimental results show that architectures deployed with TinyOps significantly outperform models designed for internal memory with up to 6% higher accuracy and importantly, 1.3-2.2x faster inference latency to set the state-of-the-art in TinyML ImageNet classification. Our work shows that the TinyOps space is more efficient compared to the internal or external memory design spaces and should be explored further for TinyML applications.

Related Material

@InProceedings{Sadiq_2022_CVPR, author = {Sadiq, Sulaiman and Hare, Jonathon and Maji, Partha and Craske, Simon and Merrett, Geoff V.}, title = {TinyOps: ImageNet Scale Deep Learning on Microcontrollers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {2702-2706} }