Signal Processing and Speech Communication Laboratory
hometheses & projects › Speaker interpolation based data augmentation for conversational speech recognition

Speaker interpolation based data augmentation for conversational speech recognition

Status
In work
Type
Master Thesis
Announcement date
08 Jan 2021
Student
Lisa Kerle
Mentors
Research Areas

Speech synthesis based on Deep Neural Networks (DNN) has made significant improvements in the last decade. Adaptive approaches allow for the synthesis of speakers from adaptation data using a large background model. The goal of this thesis is to use an adaptive DNN based speech synthesis system to train background and adaptive voices from an Austrian German corpus recorded for conversational speech recognition and several Austrian German corpora for speech synthesis. The adapted voices are then used to generate interpolated samples of the conversational corpus where speakers uttered individual utterances. In this way the corpus can be augmented by speakers with characteristics that are not present in the training data. Different interpolation methods shall be used with dynamic programming. The augmented corpus shall then be used to train a speech recogniser and shall be evaluated for word-error-rate.

Your Requirements

  • Motivation and interest in the topic
  • Good knowledge of Python and scripting languages (Bash).
  • Speech communication background
  • Recommended: Automatic Speech Recognition (VO, SS 2020)
  • Recommended: Speech Synthesis (VU, WS 2020)