Improving automatic speech recognition for pluricentric languages exemplified on varieties of German

PhD Student 
Michael Baum
Research Area

 

 A method is presented to improve speech recognition for pluricentric languages. Both the effect of adaptation of acoustic data and phonetic transcriptions for several subregions of the German speaking area are investigated and discussed. All experiments were carried out for German spoken in Germany and Austria using large telephone databases (SpeechDat). In the first part triphone-based acoustic models (AMOs) were trained for several regions and their word error rates (WERs) were compared. The WERs vary between 9.89% and 21.78% and demonstrate the importance of regional variety adaptation. In the pronunciation modeling part narrow phonetic transcriptions for a subset of the Austrian database were carried out to derive pronunciation rules for Austrian German and to generate phonetic lexica for Austrian German which are the first of their kind. These lexica were used for both triphone-based and monophone-based AMOs with German and Austrian speakers. For the monophone-based AMOs significant improvement for the adapted lexica can be reported (17.74% vs. 26.02%). Finally an algorithm to discriminate Austrian and German speakers based on prosodic features was used to increase the robustness of the recognizer for users of groups with unknown distribution of language varieties.  

 

This thesis is supervised by Gernot Kubin.