Classifying the meaning of breathings in conversational speech
- Status
- In work
- Type
- Master Thesis
- Announcement date
- 08 Oct 2019
- Student
- André Menrath
- Mentors
- Research Areas
Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered ‘ungrammatical’ and contain disfluencies such as “…oh, well, I think ahm exactly …”. Moreover, in spontaneous conversation, people do not only breathe (silently) with a physical function but also clearly audible with a communicative function. In the following sentence the speaker produces a long breathing before speaking:
” …. do you really think this is a good idea?” communicative fuction: doubting
whereas in the following sentence, the speaker produces a short strong breathing before speaking:
“…. ok, so le’ts start, we don’t have much time left!” communicative function: starting an action
Traditional social robots plan their next turn in a conversation only after the end of the human’s sentence, as the full semanics are needed to derive its meaning. Breating classification can improve the planning process and make the interaction more natural. The aim of this thesis is to build a tool that classifies breathings from a Austrian German database of natural conversations. For this purpose, different sets of acoustic features shall be compared with a given machine learning technique (e.g., Random Forests, SVMs).
Requirements: The candidate should be interested in speech processing and have excellent programming skills (e.g, Python, C++ and/or R). TEAMS are very welcome!