Improving Generalization in Reinforcement Learningusing Skill-based Rewards

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Francesco Vito Lorenzo; [2020]

Keywords: ;

Abstract: Reinforcement Learning is a promising approach to develop intelligent agents that can help game developers in testing new content. However, applying it to a game with stochastic transitions like Candy Crush Friends Saga (CCFS) presents some challenges. Previous works have proved that an agent trained only to reach the objective of a level is not able to generalize on new levels. Inspired by the way humans approach the game, we develop a two-step solution to tackle the lack of generalization. First, we let multiple agents learn different skills that can be re-used in high-level tasks, training them with rewards that are not directly related to the objective of a level. Then, we design two hybrid architectures, called High-Speed Hierarchy (HSH) and Average Bagging (AB), which allow us to combine the skills together and choose the action to take in the environment by considering multiple factors at the same time. Our results on CCFS highlight that learning skills with the proposed reward functions is effective, and leads to a higher proficiency than the baselines applying state of the art. Moreover, we show that AB exhibits a win rate on unseen levels that is twice as high as that of an agent trained only on reaching the objective of a level, and even surpasses human performance on one level. Overall, our solution is a step in the right direction to develop an automated agent that can be used in production, and we believe that with some extensions it can yield even better results.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)