A Uniform Latency Model for DNN Accelerators with Diverse Architectures and Dataflows

Design, Automation and Test in Europe Conference (DATE)

Abstract

In the early design phase of a Deep Neural Network (DNN) acceleration system, fast energy and latency estimation are important to evaluate the optimality of different design candidates on algorithm, hardware, and algorithm-to-hardware mapping, given the gigantic design space. This work proposes a uniform intra-layer analytical latency model for DNN accelerators that can be used to evaluate diverse architectures and dataflows. It employs a 3-step approach to systematically estimate the latency breakdown of different system components, capture the operation state of each memory component, and identify stall-induced performance bottlenecks. To achieve high accuracy, different memory attributes, operands’ memory sharing scenarios, as well as dataflow implications have been taken into account. Validation against an in-house taped-out accelerator across various DNN layers has shown an average latency model accuracy of 94.3%. To showcase the capability of the proposed model, we carry out 3 case studies to assess respectively the impact of mapping, workloads, and diverse hardware architectures on latency, driving design insights for algorithm-hardware-mapping co-optimization.

Featured Publications