Minimum Word Error Rate Training for Speech Separation

Seo, Jungwon

dc.contributor.advisor	Wiktorski, Tomasz
dc.contributor.author	Seo, Jungwon
dc.date.accessioned	2019-10-07T07:09:00Z
dc.date.available	2019-10-07T07:09:00Z
dc.date.issued	2019-06-15
dc.identifier.uri	http://hdl.handle.net/11250/2620490
dc.description	Master's thesis in Computer science	nb_NO
dc.description.abstract	The cocktail party problem, also known as a single-channel multi-talker problem, is a significant challenge to enhance the performance of automatic speech recognition (ASR) systems. Most existing speech separation model only concerns the signal-level performance, i.e., source-to-distortion ratio (SDR), via their cost/loss function, not a transcription-level performance. However, transcription-level measurement, such as word error rate (WER) is the ultimate measurement that can be used in the performance of ASR. Therefore we propose a new loss function that can directly consider both signal and transcription level performance with integrating both speech separation and speech recognition system. Moreover, we suggest the generalized integration architecture that can be applied to any combination of speech recognition/separation system regardless of their system environment. In this thesis, first, we review the techniques from the primary signal processing knowledge to deep learning techniques and introduce the detailed target and challenge in speech separation problem. Moreover, we analyze the several famous speech separation models derived from a deep learning approach. Then we introduce the new loss function with our detailed system architecture, including the step-by-step process from pre-processing to evaluation. We improve the performance of the existing model using our training approach. Our solution enhances average SDR from 0.10dB to 4.09dB and average WER from 92.7% to 55.7% using LibriSpeech dataset.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	University of Stavanger, Norway	nb_NO
dc.relation.ispartofseries	Masteroppgave/UIS-TN-IDE/2019;
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.subject	informasjonsteknologi	nb_NO
dc.subject	datateknikk	nb_NO
dc.subject	datateknologi	nb_NO
dc.title	Minimum Word Error Rate Training for Speech Separation	nb_NO
dc.type	Master thesis	nb_NO
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550	nb_NO

Tilhørende fil(er)

Filnavn:: Seo_Jungwon.pdf
Størrelse:: 7.407Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Studentoppgaver (TN-IDE) [866]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal