Speaker interpolation based data augmentation for conversational speech recognition
- Status
- In work
- Type
- Master Thesis
- Announcement date
- 08 Jan 2021
- Student
- Lisa Kerle
- Mentors
- Research Areas
Speech synthesis based on Deep Neural Networks (DNN) has made significant improvements in the last decade. Adaptive approaches allow for the synthesis of speakers from adaptation data using a large background model. The goal of this thesis is to use an adaptive DNN based speech synthesis system to train background and adaptive voices from an Austrian German corpus recorded for conversational speech recognition and several Austrian German corpora for speech synthesis. The adapted voices are then used to generate interpolated samples of the conversational corpus where speakers uttered individual utterances. In this way the corpus can be augmented by speakers with characteristics that are not present in the training data. Different interpolation methods shall be used with dynamic programming. The augmented corpus shall then be used to train a speech recogniser and shall be evaluated for word-error-rate.
Your Requirements
- Motivation and interest in the topic
- Good knowledge of Python and scripting languages (Bash).
- Speech communication background
- Recommended: Automatic Speech Recognition (VO, SS 2020)
- Recommended: Speech Synthesis (VU, WS 2020)