- [pdf] [arXiv]
MAPLE: Microprocessor a Priori for Latency Estimation
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation (MAPLE) that leverages hardware characteristics to predict deep neural network latency on previously unseen hardware devices. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. The CPU-specific performance metrics are also able to characterize GPUs, resulting in a versatile descriptor that does not rely on the availability of hardware counters on GPUs or other deep learning accelerators. We provide experimental insight into this novel strategy. Through this hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy, requiring as few as 3 samples from the target hardware to yield 6% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. We also demonstrate MAPLE identification of Pareto-optimal DNN architectures exhibit superlative accuracy and efficiency. The proposed technique provides a versatile and practical latency prediction methodology for DNN run-time inference on multiple hardware devices while not imposing any significant overhead for sample collection.