Data-Based Automatic Phonetic Transcription

Project Type: Master/Diploma Thesis
Student: Leitner Christina

 

 Phonetic transcriptions are an important resource in different research areas such as speech recognition or linguistics. Establishing phonetic transcriptions by hand is an exhausting process therefore it seems reasonable to develop an application that automatically creates phonetic transcriptions for given audio data. To build this automatic phonetic transcriber a new method similar to data-based speech synthesis is applied. In data-based speech synthesis a word is synthesized by recombining audio samples from a database according to their phonetic transcription. In the case of the phonetic transcriber this process is reversed: To obtain the transcription of a word its recording is compared to audio samples in the database and then the transcriptions of the most similar samples are concatenated to a new transcription. The data is taken from ADABA (Austrian Phonetic Database) that contains recordings of six speakers from the major varieties of German. The transcriptions in ADABA contain more phonetic symbols than common broad transcriptions for German (89 symbols instead of 45 for standard SAMPA-German). The application is currently restricted to a single speaker to allow for such detailed phonetic transcriptions. The audio data was segmented into triphones, a triphone candidate selection was developed and for the pattern comparison a dynamic time warping algorithm was implemented. The transcription framework was optimized with several settings. The results from the tests with the final implementation are evaluated with respect to the reference transcriptions in the ADABA database. The achieved phone recognition rate of 91.94% is comparable to rates reported from other existing phonetic transcription tools tested under similar conditions.