QuantNAS: Quantization-aware Neural Architecture Search For Efficient Deployment On Mobile Device

Tianxiao Gao, Li Guo, Shanwei Zhao, Peihan Xu, Yukun Yang, Xionghao Liu, Shihao Wang, Shiai Zhu, Dajiang Zhou; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1704-1713

Abstract


Deep convolutional networks are increasingly applied in mobile AI scenarios. To achieve efficient deployment researchers combine neural architecture search (NAS) and quantization to find the best quantized architecture. However existing methods overlook the on-device implementation of quantization. The searching result is usually sub-optimal or has limited latency reduction. To this end we propose QuantNAS a novel quantization-aware NAS based on a two-stage one-shot method. Different from the previous method our method considers the on-device implementation of the quantized network and searches for the architecture from a fully quantized supernet. During training we propose a batch-statistics-based strategy to alleviate the non-convergence problem. Besides a scale predictor is proposed and is jointly trained with the supernet. During search the scale predictor can provide optimal scale for different subnets without retraining. At different latency levels on Kirin 9000 mobile CPU the proposed method achieves 1.53%-1.68% Top-1 accuracy improvement on ImageNet 1K dataset and 1.7% mAP improvement.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Gao_2024_CVPR, author = {Gao, Tianxiao and Guo, Li and Zhao, Shanwei and Xu, Peihan and Yang, Yukun and Liu, Xionghao and Wang, Shihao and Zhu, Shiai and Zhou, Dajiang}, title = {QuantNAS: Quantization-aware Neural Architecture Search For Efficient Deployment On Mobile Device}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1704-1713} }