Benchmarking Deep Reinforcement Learning on Continuous Control Tasks : AComparison of Neural Network Architectures and Environment Designs

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Deep Reinforcement Learning (RL) has received much attention in recent years. This thesis investigates how reward functions, environment termination conditions, Neural Network (NN) architectures, and the type of the deep RL algorithm aect the performance for continuous control tasks. To this end, the Furuta pendulum swing-up task is adopted as the primary benchmark, since it oers low input- and state-dimensionality without being trivial. Focusing on model-free algorithms, the results indicate that DDPG, an actorcritic algorithm, performs significantly better than other algorithms. They also suggest that larger NN architectures may benefit performance in some instances. Comparing reward functions, Potential Based Reward Shaping (PBRS) applied to a sparse reward signal shows promising results compared to a reward function of previous work, and combining PBRS with large negative rewards for terminations due to unwanted behavior seems to improve performance for some algorithms. However, although designs such as PBRS can improve performance they are shown to not be necessary to achieve adequate performance, and the same applies to environment terminations upon unwanted behavior. Attempting to apply a DDPG agent trained in a simulator to a physical Furuta pendulum results in performance that closely resembles what is observed in the simulator for certain training seeds. The results and test suite of this thesis are available on GitHub and should hopefully help inspire future research in environment design and NN architectures for deep RL. Specifically, future work may investigate whether extensive parametertuning alters the results.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)