The effects of multistep learning in the hard-exploration problem

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Jacob Friman; [2022]

Keywords: ;

Abstract: Reinforcement learning is a machine learning field which has received revitalised interest in later years due to many success stories and advancements in deep reinforcement learning. A key part in reinforcement learning is the need for exploration of the environment so the agent can properly learn the best policy. This can prove a difficult task when reward is rarely found in hard exploration scenarios, and robustly solving these scenarios is a key problem in order to generalise reinforcement learning. Multi step learning is a tool often used in reinforcement learning in order to boost performance and has it’s roots in the early development of the field. This work investigates whether the multi step learning via n-step learning have a heightened effect in scenarios when reward is sparse, making the environment hard to explore. This due to the property of multi step learning to use more rewards and states per update, potentially better distributing reward when found. This was investigated by performing several experiments in a custom made environment where reward sparsity, n-step and exploration method is varied independently of each other. The results showed that the n-step had considerable effect on results in all cases. There was an optimal n higher than 1 and performance diverged when n was lowest. In low n-steps the agents displayed behavior of temporarily degrading performance while for higher n-steps performance improved consistently during all parts of training. Since the effect of n-step learning was universal in all scenarios and profoundly affected performance the conclusion to be made is that multi step learning does not have an elevated effect in low reward scenarios and thus does not need to considered especially when dealing with hard exploration scenarios more so than environments with higher reward densities. The conclusion is also that n-step learning is a very sensitive parameter which must considered in all scenarios.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)