FPGA Hardware Acceleration of Inception Style Parameter Reduced Convolution Neural Networks

University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Author: Kalle Ngo; [2016]

Keywords: ;

Abstract: Some researchers have noted that the growth rate in the number of network parameters of many recently proposed state-of-the-art CNN topologies is placing unrealistic demands on hardware resources and limits the practical applications of Neural Networks. This is particularly apparent when considering many of the projected applications (IoT, autonomous vehicles, etc) utilize embedded systems with even greater restrictions on computation and memory bandwidth than the typical research-class computer cluster that the CNN was designed on. The GoogLeNet CNN in 2014 proposed a new level of organization (“Inception Module”) that was demonstrated in competition to achieve similar/better performance, while using an order of magnitude less network parameters than the other competing topologies. This thesis explores the characteristics of the new GoogLeNet inception modules and the implications it presents to current CNN accelerator architectures. A custom FPGA accelerator is proposed to offset the inception module’s increased need to buffer large intermediate convolution arrays through array partitioning and cascading two convolution operations into a single pipeline pass. A Xilinx Artix-7 FPGA was used to implement architecture where it was able continuously supply data to the 331 utilized DSP blocks (approx. half of total available), while using only a quarter of the DDR bandwidth to achieve a peak throughput of 9.11 GFLOPS. The low utilization of the DDR bandwidth suggests that with some optimization, the design can be scaled up to better utilize the available resources and increase throughput.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)