Data Driven Pronunciation Modeling for Large Vocabulary Spontaneous Speech Recognition

Project Type: Master/Diploma Thesis
Student: Resch Barbara
Mentor: Gernot Kubin


 This work presents the implementation of a data driven system for modeling pronunciation variation for spontaneous speech. The transcription of spontaneous speech of about 4000 utterances of 20 speakers was used to derive formalisms of pronunciation variation. For the main approach of the thesis Classification and Regression Trees (CART) were trained from the transcriptions expressing context dependent variation. The trees were then used to build a new pronunciation dictionary. In addition, as a second approach variant pronunciations have been taken directly from the transcription and included into the dictionary. The new dictionaries have been tested on an independent set of spontaneous speech containing 100 utterances of 10 different speakers. The test corpus has been analyzed according to the different speakers and their speaking characteristics. In general, both approaches yielded only non-significant improvements. The dictionaries built with CART led to a slight but clear improvement for a subgroup of speakers of the test set, speaking more fluently than the others. The experiments presented are based on the spontaneous dictation part of the Wall Street Journal Corpus WSJ1.