CS MSc Thesis Presentation 6 February 2023


From: 2023-02-06 13:15 to 14:00
Place: E:4130 (Lucas)
Contact: birger [dot] swahn [at] cs [dot] lth [dot] se
One Computer Science MSc thesis to be presented on 6 February

Monday, 6 February there will be a master thesis presentation in Computer Science at Lund University, Faculty of Engineering.

The presentation will take place in room E:4130 (Lucas).

13:15-14:00 in E:4130 (Lucas)

Presenter: Silke Kylberg
Title: Optimizing End-to-End Neural Speaker Diarization for Swedish Customer Service Conversations
Examiner: Pierre Nugues
Supervisors: Dennis Medved (LTH), Ludwig Engström (Voxo)

Speaker diarization is a method used to answer the question "who spoke when" in an audio recording. The applications vary from movies to telephone calls, and in combination with a speech recognition system, speaker diarization can be used to enrich speech-to-text transcription with speaker labels. However, speaker diarization often requires a lot of training data. In this thesis, we investigated how to train the EEND-vector Clustering model with different types of datasets to achieve well-functioning diarization performance for Swedish customer service calls. The model was trained with English and Swedish non-domain-specific simulations, Swedish domain-specific simulations and real Swedish telephone conversations annotated with voice activity detection (VAD). Thereafter, real Swedish telephone conversations were used for fine-tuning the pre-trained models, using VAD as annotation method, and for evaluating the model performance. The best performance was reached when fine-tuning the model pre-trained with Swedish and English non-domain-specific simulations, with a DER of 12.87%.

