Signal Processing and Speech Communication Laboratory
hometheses & projects › Automatic Prosodic Annotation of Conversational German

Automatic Prosodic Annotation of Conversational German

Master Thesis
Announcement date
08 Oct 2019
Research Areas

Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered ‘ungrammatical’ and contain disfluencies such as “…oh, well, I think ahm exactly …”. Moreover, in spontaneous conversation, a word like “yesterday” may sound like yeshay and the German word “haben” (“to have”) may sound like ham. The pronunciation of the words depends on well-known factors, for instance on the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. These prosodic characteristics describe the rhythm and melody of a sentence, and for instance, whether a word is accented or not.

As the manual annotation and analysis of such prosodic characteristics is extremely time consuming, the aim of this master thesis is to built an automatic prosodic annotation tool. Based on a large set of acoustic features and a small number of manually created annotations a prosodic annotation tool shall be trained. For this purpose, different sets of acoustic features and different classification methods shall be compared (e.g., Random Forests, ANNs, DNNs). The created tool shall be programmed and documented in such a way that in the course of the project, it can be incorporated into the Speech Recognizer currently developed at our department.

Requirements: The candidate should have a background in Automatic Speech Recognition (e.g., completed Speech Communication 2), be interested in machine learning and speech processing and have excellent programming skills (e.g, Python, C++ and/or R). TEAMS are very welcome!