MSc. presentation by E. Wilson Andersson & J. Håkansson: Improving a Reinforcement Learning Algorithm for Resource Scheduling
Place: Seminar Room KC 3N27
Contact: karl-erik [dot] arzen [at] control [dot] lth [dot] se
Save event to your calendar
Elin Wilson Andersson and Johan Håkansson are defending their Master's thesis at the Dept. of Automatic Control.
Where: Seminar room KC 3N27
When: June 1, 10:30 -11:30
Author: Elin Wilson Andersson & Johan Håkansson
Title: Improving a Reinforcement Learning Algorithm for Resource Scheduling
Advisors: Karl-Erik Årzén, Dept. of Automatic Control; William Tidelund, Ericsson
Examiner: Bo Bernhardsson, Dept. of Automatic Control
This thesis aims to further investigate the viability of using reinforcement learning, specifically Q-learning, to schedule shared resources on the Ericsson Many-Core Architecture (EMCA). This was first explored by Patrik Trulsson in his master thesis Dynamic Scheduling of Shared Resources using Reinforcement Learning (2021). The shared resources complete jobs assigned to them, and the jobs have deadlines as well as a latency. The Q-learning based scheduler should minimize the latency in the system. Most importantly, it should avoid missing deadlines. It was tested on a simulation model of the EMCA that Trulsson built. Its performance was compared to a baseline and random scheduler. Several parts of the Q-learning algorithm were evaluated and modified. The action and state space have been made smaller, and the state space has been made more applicable to the real system. The reward function, as well as other parameters of the Q-learning, were altered for better performance. The result of all of these changes was that the Q-learning algorithm saw an increase in performance. Initially, it performed slightly better than the baseline on only one of the two configurations it was evaluated on, but in the end it performed significantly better on both. It also handles the introduction of noise to the simulation without a significant decrease in performance. While there are still things that might require further investigation, the algorithm always performs better than the baseline and is overall more suited for a real implementation.