DSP Design With Hardware Accelerator For Convolutional Neural Networks
Abstract: Convolutional Neural Networks impressed the world in 2012 by reaching state-of-the-art accuracy levels in the ImageNet Large Scale Visual Recognition Challenge. The era of machine learning has arrived and with it countless applications varying from autonomous driving to unstructured robotic manipulation. Computational complexity in the past years has grown exponentially, requiring highly efficient low power new hardware architectures, capable of executing those. In this work, we have performed optimization in three levels of hardware design: from algorithmic, to system, and accelerator level. The design of a DSP with Tensilica and the integration of Xenergic dual port SRAMs, for direct memory access of a convolution hardware accelerator, lead to four orders speed-up on the initial identified bottleneck, causing an estimated three times final speed-up of a single handwritten classification image compared to the pure software implementation. Higher speed-up is expected for deeper convolutional architectures and larger image dimensions, due to the linear time complexity scaling of the convolution hardware accelerator in comparison to conventional non-linear software-based approaches.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)