CS MSc Thesis Presentation Day June 3 2022
Place: See information for each presentation
Contact: birger [dot] swahn [at] cs [dot] lth [dot] se
Save event to your calendar
Thirteen MSc theses to be presented on Friday June 3, 2022
Friday June 3 is a day for coordinated master thesis presentations in Computer Science at Lund University, Faculty of Engineering. Thirteen MSc theses will be presented.
You will find information about how to follow along under each presentation. There will be presentations in three different rooms: E:4130 (Lucas), E:2405 (Glasburen) and E:2116. A preliminary schedule follows.
Note to potential opponents: Register as an opponent to the presentation of your choice by sending an email to the examiner for that presentation (email@example.com). Do not forget to specify the presentation you register for! Note that the number of opponents may be limited (often to two), so you might be forced to choose another presentation if you register too late. Registrations are individual, just as the oppositions are! More instructions are found on this page.
Presenters: Tobias Carlsson, André Svensson
Title: Surgical Instrument Detection using Deep Learning
Examiner: Elin Anna Topp
Supervisors: Maj Stenmark (LTH), Phan Kiet Tran (Barnsjukhus Lund)
The purpose of this study was to apply object detection within the field of open-heart surgery. One goal was to retrain state-of-the-art deep neural networks to detect surgical instruments. The networks used in the evaluation were YOLOv4, YOLOv5, Scaled-YOLOv4, Retinanet, Efficientdet, SSD and Faster-RCNN. We aimed to investigate the possibility of counting instrument changes during a surgery, which could help the surgeons to evaluate their performance. The possibility of using the networks incorrect predictions to generate more qualitative data was investigated. All networks have been compared with mean average precision as the main metric. YOLOv5 got the highest mAP score when comparing all networks. Using YOLOv5, we were able to predict instrument switches with promising results. Utilizing the network itself to find weaknesses and in such a way retrieve data that was difficult for it to handle, gave good results.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_08CarlssonSvensson.pdf
Presenters: August Lindberg Brännström, Alexandra Antgren
Title: Classifying Downtime Occurrences for Connected Factories Using Machine Learning
Examiner: Jacek Malec
Supervisors: Markus Borg (LTH), Sjoerd Dost (Northvolt)
Machine downtime is an important subject in manufacturing because of its connection to production rate and business profit. The causes of machine downtimes are diverse and understanding the cause is critical to have actionable information, identify areas of improvement and set specific targets. This master thesis explores the possibility of using machine learning to classify downtime occurrences for machines in connected factories. In this study we use data from a Swedish Lithium-ion battery producer. We collected downtime data from one machine in one facility and combined it with data on active alarms from this same machine. The data was analysed, cleaned and features were selected for modeling. We implemented three naive baselines, two simple supervised learning models (naive bayes and decision tree) and two ensemble models (random forest and XGBoost). For correctly classifying a downtime event with one out of 17 categories, the random forest model performed the best with an accuracy of 0.414. The results show that which alarms are active during a downtime contain information that can be connected to the reason for the machine being down. The findings indicate that machine learning can be used to determine the cause of downtime events but that more data is needed to get a higher accuracy.
Zoom link to presentation: https://lu-se.zoom.us/j/64471675682?pwd=cFFyQVBHYWprdnRQUVlrTHdaTitzdz09
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_11LindbergBrännströmAntgren.pdf
Presenters: Oscar Andersson, William Isaksson
Title: Investigating and Mitigating Effects of Quantization on Algorithmic Bias
Examiner: Jacek Malec
Supervisors: Flavius Gruian (LTH), Axel Berg (ARM), Felix Johnny Thomasmathibalan (ARM)
Quantizing neural networks is necessary for efficient inference on resource-constrained devices. In general, quantization slightly reduces the overall performance of a network. However, some sub-groups of a dataset might be impacted disproportionately. In this thesis, we investigate how quantization impacts algorithmic bias. We find that class-level bias is amplified, while on the attribute-level, for the attributes gender and age, the bias is rather unaffected. We then show that model architecture and hyperparameters plays a vital role in how a network is affected by quantization. Lastly, we propose two methods to mitigate the impact of quantization on the bias of a model.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_11AnderssonIsaksson.pdf
13:15-14:00 (N.B. Change of time)
Presenter: Olof Bengtsson
Title: Topic Classification for Swedish Podcasts Using Transformers
Examiner: Jacek Malec
Supervisors: Pierre Nugues (LTH), Kerstin Johnsson (Softhouse Consulting Öresund)
Many people listen to podcasts on a weekly basis, and they have grown much in popularity in recent years. However, compared to traditional broadcast mediums such as television and radio, there are very few tools available for analysing podcasts. In this master`s thesis I employed different machine learning models to classify the topics covered in podcast episodes, which would facilitate the analysis. The dataset I used consisted of texts collected from Swedish online forums and spanned 65 different topics. The final evaluation was then done on podcast transcriptions. After evaluating the models, the best one was found to reach an accuracy of 0.55 on an excerpt of the podcast dataset and an f1 score of 0.72 on the forum dataset. Outperforming the tf-idf baseline model with an f1 score of 0.66.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_13Bengtsson.pdf
Presenters: Oscar Fridh, Szymon Stypa
Title: Classification of Pull Requests using Transformers
Examiner: Jacek Malec
Supervisors: Pierre Nugues (LTH), Oskar Handmark (Backtick Technologies AB), Birger Kleve (Backtick Technologies AB)
The transformer architecture has given rise to excellent models for natural language understanding since its introduction, and applications for the architecture have started to emerge in new domains. This thesis explores the usage of BERT based models for pull request classification, and compares how models pre-trained on code and natural language react to different components of a pull request. We extended an existing dataset of 38 500 pull requests with additional features. We also collected and manually annotated a new test set of 500 pull requests. Using these datasets we fine-tuned multiple transformer bases on different compositions of features and hyperparameters. We first show that the transformer models can reach higher F1-scores than the previous FastText classifier from DeepRelease when using the same input features. The results improved further when extending the inputs with additional data, allowing our best ensemble classifier to achieve a macro average F1-score of 0.63. Surprisingly, we find that the models pre-trained on code perform similarly or only slightly better than those trained on natural language when classifying code diffs.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_15FridhStypa.pdf
Presenters: Fabian Lindfors, Jacob Gunnarsson
Title: Utilizing highly synchronized clocks in distributed databases
Examiner: Emma Söderberg
Supervisor: Flavius Gruian (LTH)
Relying on clocks in distributed systems has long been seen as a convenient way of ordering events but also a challenge because of the inevitable clock skew. In recent years, the availability of highly synchronized clocks has improved which enables all new innovations and system designs. In this thesis, we investigate how the distributed database CockroachDB can be adapted to utilize highly synchronized clocks. We implement our findings and evaluate the performance impact of our modifications using a custom benchmark designed to emulate a workload with long reads under read-write contention. Our results show significant performance improvements for our workload, with median latency being reduced by up to 47% for reads and 43% for writes. We also observe that our changes should not have a negative performance impact on other workloads and finally conclude that they would make valuable additions to CockroachDB.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_09GunnarssonLindfors.pdf
Presenters: Ellen Åström, Tove Thunborg
Title: Single Image Dome Reflection Removal Using Neural Networks
Examiner: Michael Doggett
Supervisors: Pierre Nugues (LTH), Victor Lantz (Axis)
Surveillance cameras are an important part of protecting people in their everyday life. Some of these cameras carry a protective dome which sometimes creates unwanted image artifacts in the form of circular lens reflections. One could solve this problem mechanically by developing less reflective domes, but this has shown to be quite hard. Another, perhaps more reliable solution, would be to develop a neural network which can filter out the reflections. Many reflection removal networks already exist. However, none of them have been trained on dome reflections. In this thesis, we investigate the dome reflection removal performance of four existing reflection removal networks. We fine-tune the networks using our own synthesized data set, and evaluate the results both quantitatively and qualitatively. The results show that the Enhanced Reflection Removal Network perform best. Moreover, this fine-tuned network shows a significant improvement in the dome reflection removal ability, compared to the initial pre-trained network.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_11ÅströmThunborg.pdf
Presenter: Kåre von Geijer
Title: Highly scalable queues and stacks with elastic relaxation
Examiner: Flavius Gruian
Supervisors: Jonas Skeppstedt (LTH), Philippas Tsigas (Chalmers)
Traditional concurrent data structures like queues or stacks have an inherent bottle neck due to all operations having to access the ends of the lists, leading to high contention. By relaxing the semantics of a data structure, we don’t have to force each operation to take effect in the same order as they were invoked. This can often lead to reduced contention and increased throughput, at the cost of accuracy. This paper builds on an earlier paper which introduced a lock-free framework for such semantically relaxed data structures where the relaxation could be decoupled in two orthogonal dimensions. Our contribution is to be able to change these two relaxation measures during run time for their queue and stack. The two data structures use different ideas but build on creating a sort of auxiliary node. The designs are both analyzed theoretically and evaluated empirically to show that they work well.
Zoom link to presentation: https://lu-se.zoom.us/j/65699417822?pwd=Z3J5RTBUZ1IzNGRXUGxLYnpWWGlhZz09
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_13vonGeijer.pdf
Presenter: Charlie Mrad
Title: Image Upscaling for Ray Traced Foveated Rendering
Examiner: Flavius Gruian
Supervisor: Michael Doggett (LTH)
Foveated rendering is a potential optimization that can have a big impact on render times for computer graphics, and we have seen in recent years that image upscaling and AI driven super sampling are getting more popular. Therefore we investigate the relationship of these two areas and how well they work together by testing Nvidia's DLSS with a kernel log-polar based foveated rendering. In the process we find that the two technologies are not initially compatible and produce subpar images without applying other methods to reduce this issue. For this purpose we use a method that makes use of TAA to stabilize the resulting image and find that we can get upwards of a 1.75X increase in performance for a small reduction in image quality, all the while maintaining a temporally stable image.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_15Mrad.pdf
Presenters: Noah Mayerhofer, Sandra Nyström
Title: Designing a Machine Learning Application to Obtain Customer Insights in the Banking Domain
Examiner: Emelie Engström
Supervisor: Elizabeth Bjarnason (LTH)
Any business needs to understand their customers to improve their product or service. Banks are information intense and thus have great potential to find new customer insights with technological tools. This thesis aimed to investigate how to improve customer understanding using user data. We investigated this by creating an ML-based application using specific transaction data from a case company using the CRISP-DM methodology phases. We compared two machine learning models, which both performed better than the benchmark at predicting the next 26 weeks of seasonal patterns. We identified what challenges we encountered and discussed how they impacted us. When demonstrating the generated information, the users did receive new customer insights. However, further work is needed to improve the application as well as test the process for other banks.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_09MayerhoferNyström.pdf
Presenter: Johan Bengtsson
Title: Quality Measurement of Generative Dialogue Models for Language Practice
Examiner: Emelie Engström
Supervisors: Markus Borg (LTH), Alexander Hagelborn (NordAxon AB)
NordAxon currently develops Aida, a Generative Dialogue Model (GDM) based upon Natural Language Processing, to help Swedish-learners learn Swedish. During development, several models are trained, and then they need to perform a model selection, a non-trivial task. The aim was to address how to assist ML engineers with the model selection. First, information was gathered through interviews and questionnaires to Swedish for Immigrants-teachers, and a literature review. Then, based upon these sources, the most important quality metrics were chosen and provided basis for a test framework. It was found that the GDM shall be coherent, non-toxic, and adjust the language level to the user. A test framework was developed, which generates conversations, which are then analyzed. After the analysis, the test results are visualized in a Grafana dashboard. The results indicate that meaningful differences between GDMs could be detected, which may indeed aid in the process of model selection.
Zoom-link to presentation: https://lu-se.zoom.us/j/63799922758
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_11Bengtsson.pdf
Presenters: David Johansson, Jonathan Paul
Title: Categorization of Cypher Queries to Improve Benchmark Coverage for Graph Databases
Examiner: Björn Regnell
Supervisors: Per Runeson (LTH), Simon Priisalu (Neo4j), Jens Wollert Ehlers (Neo4j)
Benchmarks are often used to find regressions to avoid performance dropping over time. To make benchmarks relevant for a product, the benchmarks should mirror the users' needs and uses of functionality. To achieve this, user data can be used as a foundation when creating new benchmarks and thus improving the coverage. This thesis was carried out at Neo4j who develops the most frequently used graph database. Using data from their database as a service (AuraDB), we focused on finding a way to improve the coverage of the benchmark suite run by them. Using the Design Science Paradigm we formulated problem constructs through interviews with developers at Neo4j. From these problem constructs, a solution was designed, and multiple technological rules were created. Through a validation process using interviews, we identified the validity of our thesis. The solution was designed by identifying weaknesses in the workloads by categorizing Cypher queries using the Abstract Syntax Tree (AST) generated by the Cypher parser. The AST was then used to compare queries run by users of the AuraDB platform with the queries run by the benchmarking suite. The categorization helps finding use cases of the end-user that are currently not covered by the benchmarks. We identified categories for accurate categorization of Cypher queries based on the Cypher syntax and applied this to both the AuraDB logs and the benchmarks. Using the categorization created from the AST, we initially identified the coverage of the benchmark suite to be 45.70% of the total number of user queries run on the AuraDB platform. With the tool developed in the thesis, new benchmarks were created to increase the coverage to 60.18%, with more benchmarks to be developed. From our validation we saw that the tool developed in this thesis met the requirements based on the technological rules. This was validated through interviews with developers with insight in how the tool will be used. Due to the lack of database statistics in the AuraDB logs, our solution design does not solve all problem constructs.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_13JohanssonPaul.pdf
Presenters: Max Gustafson, Liam Fahlstad
Title: Personalizing the Order of Search Results Using Machine Learning
Examiner: Flavius Gruian
Supervisor: Pierre Nugues (LTH), Emelie Lundh (INGKA)
With the growth of the internet, search engines such as “Google” have been developed which help the user navigate the web. The high standard set by such search engines is transferred to enterprise internal search, which unfortunately is often inferior. One aspect that is utilized extensively by commercial search engines, but rarely by large companies, is personalization. Together with IKEA, we study the effect of personalization in enterprise search by implementing a reranker. Such a reranker reorders the result list returned by the search engine to place documents relevant to the user towards the top. The reranker is a machine learning problem which we solve using the “learning-to-rank-algorithm” LambdaMART. The three implemented models are XGBoost, LightGBM and a Neural Network which were evaluated on two datasets. One from IKEA and one public from a Kaggle competition. Reranking with our models placed relevant documents higher on the resulting list.
Link to popular science summary: https://fileadmin.cs.lth.se/cs/Education/Examensarbete/Popsci/220603_14GustafsonFahlstad.pdf