Guided policy search for a lightweight industrial robot arm

University essay from Luleå tekniska universitet/Rymdteknik

Author: Jack White; [2018]

Keywords: ;

Abstract: General autonomy is at the forefront of robotic research and practice. Earlier research has enabled robots to learn movement and manipulation within the context of a specific instance of a task and to learn from large quantities of empirical data and known dynamics. Reinforcement learning (RL) tackles generalisation, whereby a robot may be relied upon to perform its task with acceptable speed and fidelity in multiple---even arbitrary---task configurations. Recent research has advanced approximate policy search methods of RL, in which a function approximator is used to represent an optimal policy while avoiding calculation across the large dimensions of the state and action spaces of real robots. This thesis details the implementation and testing, on a lightweight industrial robot arm, of guided policy search (GPS), an RL algorithm that seeks to avoid the typical need, in machine learning, for lots of empirical behavioural samples, while maximising learning speed. GPS comprises a local optimal policy generator, here based on a linear-quadratic regulator, and an approximate general policy representation, here a feedforward neural network. A controller is written to interface an existing back-end implementation of GPS and the robot itself. Experimental results show that the GPS agent is able to perform basic reaching tasks across its configuration space with approximately 15 minutes of training, but that the local policies generated fail to be fully optimised within that timescale and that post-training operation suffers from oscillatory actions under perturbed initial joint positions. Further work is discussed and recommended for better training of GPS agents and making locally optimal policies more robust to disturbance while in operation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)