Spiking Reinforcement Learning for Robust Robot Control Under Varying Operating Conditions

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Philipp Mondorf; [2022]

Keywords: ;

Abstract: Over the last few years, deep reinforcement learning (RL) has gained increasing popularity for its successful application to a variety of complex control and decision-making tasks. As the demand for deep RL algorithms deployed in challenging real-world environments grows, their robustness towards uncertainty, disturbances and perturbations of the environment becomes more and more important. However, most traditional deep reinforcement learning methods are not inherently robust. Instead, many state-of-the-art deep RL algorithms have been observed to struggle with uncertainties and unforeseen changes in the environment. Spiking neural networks (SNNs) represent a new generation of neural networks that bridge the gap between neuroscience and machine learning. Unlike conventional ANNs, SNNs employ biological neuron models as computational units that communicate via sparse, event-driven sequences of electrical impulses. When deployed on dedicated neuromorphic hardware, spiking neural networks exhibit favorable properties such as high energyefficiency and low latency. Recent research further indicates that SNNs are inherently robust to noise and small input errors. This makes them promising candidates to be applied within the field of reinforcement learning. In this thesis, we propose the fully spiking deep deterministic policy gradient (FS-DDPG) algorithm, a spiking actor-critic network for continuous control that can handle high-dimensional state and action spaces. Unlike conversion-based methods, the FS-DDPG algorithm is trained directly via error backpropagation based on surrogate gradients. We show that the FS-DDPG algorithm can successfully learn control policies for different locomotion problems, each involving the control of complex multi-joint dynamics. Furthermore, we evaluate the algorithm’s robustness towards sensor noise and perturbations of the environment, and compare it to the DDPG algorithm, a non-spiking actor-critic network with comparable network architecture. While the DDPG algorithm outperforms the spiking actor-critic network on the control tasks, we find the FS-DDPG algorithm to be more robust to sensor noise and measurement errors. Moreover, both algorithms seem to respond similarly to perturbations of the environment. Our results align with previous works and indicate that spiking neural networks exhibit desirable robustness properties towards sensor noise and measurement errors. Aside from low latency and high power-efficiency on neuromorphic hardware, this might be a substantial advantage of SNNs, in particular within reinforcement learning.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)