Using Reinforcement Learning to Correct Soft Errors of Deep Neural Networks

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Deep Neural Networks (DNNs) are becoming increasingly important in various aspects of human life, particularly in safety-critical areas such as autonomous driving and aerospace systems. However, soft errors including bit-flips can significantly impact the performance of these systems, leading to serious consequences. To ensure the reliability of DNNs, it is essential to guarantee their performances. Many solutions have been proposed to enhance the trustworthiness of DNNs, including traditional methods like error correcting code (ECC) that can mitigate and detect soft errors but come at a high cost of redundancy. This thesis proposes a new method of correcting soft errors in DNNs using Deep Reinforcement Learning (DRL) and Transfer Learning (TL). DRL agent can learn the knowledge of identifying the layer-wise critical weights of a DNN. To accelerate the training time, TL is used to apply this knowledge to train other layers. The primary objective of this method is to ensure acceptable performance of a DNN by mitigating the impact of errors on it while maintaining low redundancy. As a case study, we tested the proposed method approach on a multilayer perception (MLP) and ResNet-18, and our results show that our method can save around 25% redundancy compared to the baseline method ECC while achieving the same level of performance. With the same redundancy, our approach can boost system performance by up to twice that of conventional methods. By implementing TL, the training time of MLP is shortened to around 81.11%, and that of ResNet-18 is shortened to around 57.75%.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)