CS MSc Thesis Zoom Presentation 11 February 2022
Place: In Zoom: https://lu-se.zoom.us/j/66611233792
Contact: birger [dot] swahn [at] cs [dot] lth [dot] se
Save event to your calendar
One Computer Science MSc thesis to be presented on 11 February
Friday, 11 February there will be a master thesis presentation in Computer Science at Lund University, Faculty of Engineering.
The presentation will take place in Zoom: https://lu-se.zoom.us/j/66611233792
Note to potential opponents: Register as an opponent to the presentation of your choice by sending an email to the examiner for that presentation (firstname.lastname@example.org). Do not forget to specify the presentation you register for! Note that the number of opponents may be limited (often to two), so you might be forced to choose another presentation if you register too late. Registrations are individual, just as the oppositions are! More instructions are found on this page.
Presenters: Jacob Curman, Alv Romell
Title: Multilingual Large Scale Text Classification for Troubleshooting Management
Examiner: Pierre Nugues
Supervisors: Markus Borg (LTH), Olof Steinert (Scania)
This master's thesis explores the possibility of using pre-trained transformer-based language models to predict the malfunctioning part on trucks based on human generated text descriptions from workshop orders. It tackles a large-scale text classification problem with heavy data imbalance and multiple languages based on data from a Swedish truck manufacturer. Data analysis was done to understand the domain and generate hypotheses for methods of increasing predictive performance, with a special focus on underrepresented classes and languages. Experiments were set up to test the pretrained models’ predictive performance in both a monolingual and a multilingual domain, based on different techniques to reduce the data complexity, sampling techniques and augmentation through unidirectional translation. Findings show that basic methods of upsampling infrequent classes or languages improve performance on the underrepresented segments, and that monolingual models trained on translated data can perform equally well as multilingual models trained on data in its original language.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220211_13CurmanRomell.pdf