Speaker interpolation based data augmentation for conversational speech recognition

home › theses & projects › Speaker interpolation based data augmentation for conversational speech recognition

Speaker interpolation based data augmentation for conversational speech recognition

Status

finished

Type

Master Thesis

Announcement date

08 Jan 2021

Student

Lisa Kerle

Mentors

Research Areas

Speech Communication

Speech synthesis based on Deep Neural Networks (DNN) has made significant improvements in the last decade. Adaptive approaches allow for the synthesis of speakers from adaptation data using a large background model. The goal of this thesis is to use an adaptive DNN based speech synthesis system to train background and adaptive voices from an Austrian German corpus recorded for conversational speech recognition and several Austrian German corpora for speech synthesis. The adapted voices are then used to generate interpolated samples of the conversational corpus where speakers uttered individual utterances. In this way the corpus can be augmented by speakers with characteristics that are not present in the training data. Different interpolation methods shall be used with dynamic programming. The augmented corpus shall then be used to train a speech recogniser and shall be evaluated for word-error-rate.

Your Requirements

Motivation and interest in the topic
Good knowledge of Python and scripting languages (Bash).
Speech communication background
Recommended: Automatic Speech Recognition (VO, SS 2020)
Recommended: Speech Synthesis (VU, WS 2020)