Improving a Reinforcement Learning Algorithm for Resource Scheduling

University essay from Lunds universitet/Institutionen för reglerteknik

Abstract: This thesis aims to further investigate the viability of using reinforcement learning, specifically Q-learning, to schedule shared resources on the Ericsson Many-Core Architecture (EMCA). This was first explored by Patrik Trulsson in his master thesis Dynamic Scheduling of Shared Resources using Reinforcement Learning (2021). The shared resources complete jobs assigned to them, and the jobs have deadlines as well as a latency. The Q-learning based scheduler should minimize the latency in the system. Most importantly, it should avoid missing deadlines. In this work, the Q-learning algorithm was tested on a simulation model of the EMCA that Trulsson built. Its performance was compared to a baseline and random scheduler. Several parts of the Q-learning algorithm were evaluated and modified. The action and state space have been made smaller, and the state space has been made more applicable to the real system. The reward function, as well as other parameters of the Q-learning, were altered for better performance. The result of all of these changes was that the Q-learning algorithm saw an increase in performance. Initially, it performed slightly better than the baseline on only one of the two configurations it was evaluated on, but in the end it performed significantly better on both. It also handles the introduction of noise to the simulation without a significant decrease in performance. While there are still things that might require further investigation, the algorithm always performs better than a baseline scheduler provided by Ericsson and is overall more suited for a real implementation due to the changes that have been done.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)