Minimum Word Error Rate Training for Speech Separation

Seo, Jungwon

Seo, Jungwon

Master thesis

Åpne

Seo_Jungwon.pdf (7.407Mb)

Permanent lenke

http://hdl.handle.net/11250/2620490

Utgivelsesdato

2019-06-15

Metadata

Vis full innførsel

Samlinger

Studentoppgaver (TN-IDE) [835]

Sammendrag

The cocktail party problem, also known as a single-channel multi-talker problem, is a significant challenge to enhance the performance of automatic speech recognition (ASR) systems. Most existing speech separation model only concerns the signal-level performance, i.e., source-to-distortion ratio (SDR), via their cost/loss function, not a transcription-level performance. However, transcription-level measurement, such as word error rate (WER) is the ultimate measurement that can be used in the performance of ASR. Therefore we propose a new loss function that can directly consider both signal and transcription level performance with integrating both speech separation and speech recognition system. Moreover, we suggest the generalized integration architecture that can be applied to any combination of speech recognition/separation system regardless of their system environment.

In this thesis, first, we review the techniques from the primary signal processing knowledge to deep learning techniques and introduce the detailed target and challenge in speech separation problem. Moreover, we analyze the several famous speech separation models derived from a deep learning approach. Then we introduce the new loss function with our detailed system architecture, including the step-by-step process from pre-processing to evaluation.

We improve the performance of the existing model using our training approach. Our solution enhances average SDR from 0.10dB to 4.09dB and average WER from 92.7% to 55.7% using LibriSpeech dataset.

Beskrivelse

Master's thesis in Computer science

Utgiver

University of Stavanger, Norway

Serie

Masteroppgave/UIS-TN-IDE/2019;

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal