Generalizing Deep Deterministic Policy Gradient

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Gustaf Jacobzon; Martin Larsson; [2018]

Keywords: ;

Abstract: We extend Deep Deterministic Policy Gradient, a state of the art algorithm for continuous control, in order to achieve a high generalization capability. To achieve better generalization capabilities for the agent we introduce drop-out to the algorithm one of the most successful regularization techniques for generalization in machine learning. We use the recently published exploration technique, parameter space noise, to achieve higher stability and less likelihood of converging to a poor local minimum. We also replace the nonlinearity Rectified Linear Unit (ReLU) with Exponential Linear Unit (ELU) for greater stability and faster learning for the agent. Our results show that an agent trained with drop-out has generalization capabilities that far exceeds one that was trained with L2-regularization, when evaluated in the racing simulator TORCS. Further we found ELU to produce a more stable and faster learning process than ReLU when evaluated in the physics simulator MuJoCo.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)