Automatic detection of laughter in conversational speech

Project Type: Student Project, Master/Diploma Thesis
Project Status: Open

Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered 'ungrammatical' and contain disfluencies such as “...oh, well, I think ahm exactly …”. Moreover, in spontaneous conversation, people laugh a lot and they often speak and laugh at the same time. In addition, it is very likeley that the conversation partner laughs at the same time.  To recognize the words spoken during laughter is especially challenging for speech recognition.

The aim of this thesis is to build a tool that detects laughter in natural conversations.  For this purpose, different sets of acoustic features shall be compared  on a given machine learning technique (e.g., Random Forests, SVMs). The created tool will have to be documented in such a way that in the course of the project, it can be incorporated into the Speech Recognizer currently being developed at our department.

 

Requirements: The candidate should have a background in Automatic Speech Recognition (e.g., completed Speech Communication 2), be interested in speech processing and have excellent programming skills (e.g, Python, C++ and/or R). TEAMS are very welcome!