Vis enkel innførsel

dc.contributor.advisorWiktorski, Tomasz
dc.contributor.authorSeo, Jungwon
dc.date.accessioned2019-10-07T07:09:00Z
dc.date.available2019-10-07T07:09:00Z
dc.date.issued2019-06-15
dc.identifier.urihttp://hdl.handle.net/11250/2620490
dc.descriptionMaster's thesis in Computer sciencenb_NO
dc.description.abstractThe cocktail party problem, also known as a single-channel multi-talker problem, is a significant challenge to enhance the performance of automatic speech recognition (ASR) systems. Most existing speech separation model only concerns the signal-level performance, i.e., source-to-distortion ratio (SDR), via their cost/loss function, not a transcription-level performance. However, transcription-level measurement, such as word error rate (WER) is the ultimate measurement that can be used in the performance of ASR. Therefore we propose a new loss function that can directly consider both signal and transcription level performance with integrating both speech separation and speech recognition system. Moreover, we suggest the generalized integration architecture that can be applied to any combination of speech recognition/separation system regardless of their system environment. In this thesis, first, we review the techniques from the primary signal processing knowledge to deep learning techniques and introduce the detailed target and challenge in speech separation problem. Moreover, we analyze the several famous speech separation models derived from a deep learning approach. Then we introduce the new loss function with our detailed system architecture, including the step-by-step process from pre-processing to evaluation. We improve the performance of the existing model using our training approach. Our solution enhances average SDR from 0.10dB to 4.09dB and average WER from 92.7% to 55.7% using LibriSpeech dataset.nb_NO
dc.language.isoengnb_NO
dc.publisherUniversity of Stavanger, Norwaynb_NO
dc.relation.ispartofseriesMasteroppgave/UIS-TN-IDE/2019;
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.subjectinformasjonsteknologinb_NO
dc.subjectdatateknikknb_NO
dc.subjectdatateknologinb_NO
dc.titleMinimum Word Error Rate Training for Speech Separationnb_NO
dc.typeMaster thesisnb_NO
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550nb_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal