Labyrinth Navigation Using Reinforcement Learning with a High Fidelity Simulation Environment

University essay from Linköpings universitet/Reglerteknik

Author: Olle Eriksson; Axel Malmberg; [2022]

Keywords: ;

Abstract: This is a master thesis on the subject of navigation and control using reinforcementlearning, more specifically discrete Q-learning. The Q-learning algorithmis used to develop a steer policy from training inside of a simulation environment.The problem is to navigate a steel ball through a maze made from walls and holes. This thesis is the third thesis made revolving around this problem which allows for performance comparison with more classical control algorithms. The most successful of which is the gain scheduled LQR used to follow a splined path. The reinforcement learning derived steer policy managed at best 68 % success rate when navigating the ball from start to finish. Key features that had large impacton the policy performance when implemented in the simulation environment were response time of the physical servos and uncertainty added to the modelled forces. Compared to the performance of the LQR, which managed 46 % success rate, the reinforcement learning derived policy performs well. But with high fluctuation in performance policy to policy the control method is not a consistent solution to the problem. Future work is needed to perfect the algorithm and the resulting policy. A few interesting issues to investigate could be other formulations of disturbance implementation and training online on the physical system. Training online could allow for fine tuning of the simulation derived policy and learning how to compensate for disturbances that are difficult to model, such as bumps and warping in the labyrinth surface.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)