Deep Learning Model Deployment for Spaceborne Reconfigurable Hardware : A flexible acceleration approach

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Space debris and space situational awareness (SSA) have become growing concerns for national security and the sustainability of space operations, where timely detection and tracking of space objects is critical in preventing collision events. Traditional computer-vision algorithms have been used extensively to solve detection and tracking problems in flight, but recently deep learning approaches have seen widespread adoption in non-space related applications for their high accuracy. The performanceper-watt and flexibility of reconfigurable Field-Programmable Gate Arrays (FPGAs) make them a good candidate for deep learning model deployment in space, supporting in-flight updates and maintenance. However, the FPGA design costs of custom accelerators for complex algorithms remains high. The research focus of the thesis relies on novel high-level synthesis (HLS) workflows that allow the developer to raise the level of abstraction and lower design costs for deep learning accelerators, particularly for space-representative applications. To this end, four different hardware accelerators of convolutional neural network models for spacebased debris detection are implemented (ResNet, SqueezeNet, DenseNet, TinyCNN), using the open-source HLS tool NNgen. The obtained hardware accelerators are deployed to a reconfigurable module of the Zynq Ultrascale+ MPSoC programmable logic, and compared in terms of inference performance, resource utilization and latency. The tests on the target hardware show a detection accuracy over 95% for ResNet, DenseNet and SqueezeNet, and a localization intersection-over-union over 0.5 for the deep models, and over 0.7 for TinyCNN, for space debris objects at a range between 1km and 100km for a diameter of 1cm, or between 100km and 1000km for a diameter of 10cm. The obtained speed-ups with respect to software-only implementations lay between 3x and 32x for the different hardware accelerators.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)