Signal Processing and Speech Communication Laboratory
hometheses & projects › Classifying the meaning of breathings in conversational speech

Classifying the meaning of breathings in conversational speech

In work
Master Thesis
Announcement date
08 Oct 2019
André Menrath
Research Areas

Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered ‘ungrammatical’ and contain disfluencies such as “…oh, well, I think ahm exactly …”. Moreover, in spontaneous conversation, people do not only breathe (silently) with a physical function but also clearly audible with a communicative function. In the following sentence the speaker produces a long breathing before speaking:

” …. do you really think this is a good idea?” communicative fuction: doubting

whereas in the following sentence, the speaker produces a short strong breathing before speaking:

“…. ok, so le’ts start, we don’t have much time left!” communicative function: starting an action

Traditional social robots plan their next turn in a conversation only after the end of the human’s sentence, as the full semanics are needed to derive its meaning. Breating classification can improve the planning process and make the interaction more natural. The aim of this thesis is to build a tool that classifies breathings from a Austrian German database of natural conversations. For this purpose, different sets of acoustic features shall be compared with a given machine learning technique (e.g., Random Forests, SVMs).

Requirements: The candidate should be interested in speech processing and have excellent programming skills (e.g, Python, C++ and/or R). TEAMS are very welcome!