Prosody-Based Speaking Style Detection

Project Type: Master/Diploma Thesis
Project Status: Open

Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered 'ungrammatical' and contain disfluencies such as “...oh, well, I think ahm exactly …”. Moreover, in spontaneous conversation, a word like “yesterday” may sound like yeshay and the German word “haben” (“to have”) may sound like ham. The pronunciation of the words depends on the regional background of the speakers and also on the formality of the situation. One speaker thus may say the same sentences differently in different situations (i.e. different speaking styles). For instance, when he talks to a friend (fast and sloppy), when he talks to a professor (fast because he is nervous but clearly articulated) or when he speaks to a robot (slow and hyperarticulated). The aim of this thesis is to build a classifier that can distinguish different speaking styles on the basis of prosodic features.   For this purpose, different sets of acoustic features and different classification methods shall be tested (e.g., Random Forests, ANNs). The created tool shall be programmed and documented in such a way that in the course of the project, it can be incorporated into the Speech Recognizer currently developed at our department.

The candidate should have a background in speech processing and machine learning and have excellent programming skills (e.g., C++, Python or R). Teams are very welcome!