lunduniversity.lu.se

Digit@LTH

Faculty of Engineering, LTH

Denna sida på svenska This page in English

Digit@LTH: Events

MSc. presentation by E. Wilson Andersson & J. Håkansson: Improving a Reinforcement Learning Algorithm for Resource Scheduling

Seminarium

From: 2022-06-01 10:30 to 11:30
Place: Seminar Room KC 3N27
Contact: karl-erik [dot] arzen [at] control [dot] lth [dot] se
Save event to your calendar


Elin Wilson Andersson and Johan Håkansson are defending their Master's thesis at the Dept. of Automatic Control.

Where: Seminar room KC 3N27

When: June 1, 10:30 -11:30

Author: Elin Wilson Andersson & Johan Håkansson

Title: Improving a Reinforcement Learning Algorithm for Resource Scheduling

Advisors: Karl-Erik Årzén, Dept. of Automatic Control; William Tidelund, Ericsson

Examiner: Bo Bernhardsson, Dept. of Automatic Control


Abstract:

This thesis aims to further investigate the viability of using reinforcement learning, specifically Q-learning, to schedule shared resources on the Ericsson Many-Core Architecture (EMCA). This was first explored by Patrik Trulsson in his master thesis Dynamic Scheduling of Shared Resources using Reinforcement Learning (2021). The shared resources complete jobs assigned to them, and the jobs have deadlines as well as a latency. The Q-learning based scheduler should minimize the latency in the system. Most importantly, it should avoid missing deadlines. It was tested on a simulation model of the EMCA that Trulsson built. Its performance was compared to a baseline and random scheduler. Several parts of the Q-learning algorithm were evaluated and modified. The action and state space have been made smaller, and the state space has been made more applicable to the real system. The reward function, as well as other parameters of the Q-learning, were altered for better performance. The result of all of these changes was that the Q-learning algorithm saw an increase in performance. Initially, it performed slightly better than the baseline on only one of the two configurations it was evaluated on, but in the end it performed significantly better on both. It also handles the introduction of noise to the simulation without a significant decrease in performance. While there are still things that might require further investigation, the algorithm always performs better than the baseline and is overall more suited for a real implementation.