Student Project Assistant: Automatic Prosodic Annotation of Conversational German

Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered 'ungrammatical' and contain disfluencies such as “...oh, well, I think ahm exactly …”. Moreover, in spontaneous conversation, a word like “yesterday” may sound like yeshay and the German word “haben” (“to have”) may sound like ham. The pronunciation of the words depends on well-known factors, for instance on the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. These prosodic characteristics describe the rhythm and melody of a sentence, and for instance, whether a word is accented or not.

As the manual annotation and analysis of such prosodic characteristics is extremely time consuming, an automatic prosodic annotation tool shall be built and integrated into our Kaldi based Speech Recognition Engine. The candidate should have a background in speech processing and/or machine learning and have excellent programming skills (e.g., Python, R, or similar). Furthermore, there is the possibility to combine this practical work with writing a Master/Diploma Thesis in the field of speech processing, speech technology or machine learning. The position is paid for approximately 6 months with 10-15h/week. Please send your application letter and your CV to

1. January 2019 - 30. June 2019
Dr. Philip Garner, Idiap, Switzerland