Implementation of a Deep Learning Inference Accelerator on the FPGA.

University essay from Lunds universitet/Institutionen för elektro- och informationsteknik

Abstract: Today, Artificial Intelligence is one of the most important technologies, ubiquitous in our daily lives. Deep Neural Networks (DNN's) have come up as state of art for various machine intelligence applications such as object detection, image classification, face recognition and performs myriad of activities with exceptional prediction accuracy. AI in this contemporary world is moving towards embedded platforms for inference on the edge. This is essential to avoid latency, enhance data security and realize real-time performance. However, these DNN algorithms are computational and memory intensive. Consequently, exploiting immense energy, compute resources and memory-bandwidth making it difficult to be deployed in embedded devices. To solve this problem and realize an on-device AI acceleration, dedicated energy-efficient hardware accelerators are paramount. This thesis involves the implementation of such a dedicated deep learning accelerator on the FPGA. The NVIDIA's Deep Learning Accelerator (NVDLA), is encompassed in this research to explore SoC designs for integrated inference acceleration. NVDLA, an open-source architecture, standardizes deep learning inference acceleration on hardware. It optimizes inference acceleration all across the full stack from application through hardware to achieve energy efficiency synergy with the demanding throughput requirements. Therefore, the following thesis probes into the NVDLA framework to perceive the consistent workflow across the whole hardware-software programming hierarchies. Besides, the hardware design parameters, optimization features and system configurations of the NVDLA systems are analyzed for efficient implementations. Also, a comparative study of the diverse NVDLA SoC implementations (nv\_small and nv\_medium) with respect to performance metrics such as power, area, and throughput are discussed. Our approach engages prototyping of Nvidia’s Deep Learning Accelerator on a Zynq Ultrascale+ ZCU104 FPGA to examine its system functionality. The Hardware design of the system is carried out using Xilinx's Vivado Design Suite 2018.3 in Verilog. While the on-device software runs Linux kernel 4.14 on Zynq MPSoC. Thus, the software ecosystem is built with PetaLinux tools from Xilinx. The entire system architecture is validated using the pre-built regression tests that verify individual CNN layers. Besides these NVDLA hardware design also runs pre-compiled AlexNet as a benchmark for performance evaluation and comparison

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)