Text-to-Speech Engine with Austrian German Corpus

Project Type: Master/Diploma Thesis
Student: Kranzler Christian

 

 Several common methods are known in speech synthesis, for example formant synthesis, diphone synthesis or HMM-based synthesis. Unit selection synthesis is the most common method today and several language corpora already exist for this method, for instance English or German as spoken in Germany. This thesis deals with developing a speech corpus for the Austrian variety of German by (re)using the resources for German German and adapting them for the Austrian German corpus with all the steps in between like selecting a speaker whose speech is consistent in terms of prosodic features, or the selection of a representative phone set which covers all the aspects of the language variety. It turns out that the most important step in creating a unit selection speech corpus is the correct transcription of the recorded material, because a wrong allocation of units leads to mispronunciation which is not acceptable for human listeners. For deriving an Austrian German corpus from a German German one, this means adaptation on different levels such as the lexicon level, phone level, or speech data level, whereas a compromise between reusing the given resource and an exact phonetic transcription, which is very time consuming and has to be corrected manually, has to be found. In the end it turns out that the adaptations lead to a correct pronunciation in 93% of the cases. This is a good result with respect to the relatively small amount of 298 recorded sentences as input.