S-MARL: An Algorithm for Single-To-Multi-Agent Reinforcement Learning : Case Study: Formula 1 Race Strategies

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: A Multi-Agent System is a group of autonomous, intelligent, interacting agents sharing an environment that they observe through sensors, and upon which they act with actuators. The behaviors of these agents can be either defined upfront by programmers or learned by trial-and-error resorting to Reinforcement Learning. In this last context, the approaches proposed by literature can be categorized either as Single-Agent or Multi-Agent. The former approaches experience more stable training at the cost of defining upfront the policies of all the agents that are not learning, with the risk of limiting the performances of the learned policy. The latter approaches do not have such a limitation but experience higher training instability. Therefore, we propose a new approach based on the transition from Single-Agent to Multi-Agent Reinforcement Learning that exploits the benefits of both approaches: higher stability at the beginning of the training to learn the environment’s dynamics, and unconstrained agents in the latest phases. To conduct this study, we chose Formula 1 as the Multi-Agent System, a complex environment with more than two interacting agents. In doing so, we designed a realistic racing simulation environment, framed as a Markov Decision Process, able to reproduce the core dynamics of races. After that, we trained three agents based on Semi-Gradient Q-Learning with different frameworks: pure Single-Agent, pure Multi-Agent, and Single-to-Multi-Agent. The results established that, given the same initial conditions and training episodes, our approach outperforms both the Single-Agent and Multi-Agent frameworks, obtaining higher scores in the proposed benchmarks.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)