Signal Processing and Speech Communication Laboratory

Machine Learning for Automatic F0 Contour Discrimination and Labelling

Status
Open
Type
Master Thesis
Announcement date
02 Aug 2021
Mentors
Research Areas

Spoken language, consists of more than just the uttering of words and phrases. Besides the lexical content of a phrase, its prosodic appearance does play an important role in human-human conversation. Depending on how a sentence is said, the meaning can be changed. An example where this becomes obvious is irony or sarcasm: A short phrase like “Really?” can show either honest excitement or complete boredom – depending on its underlying intonation. This example already suggests how prosody has a high impact on how we perceive what is said. One of the most important prosodic parameters that strongly contributes to how humans perceive intonation (or prosodic melody of words) is pitch, the perceived fundamental frequency of the human voice ($F_0$). The physically measurable $F_0$ contour does show various shapes, depending on the intonation of an uttered phrase. For instance, there might be flat contours, falls, rises, dips, valleys etc. – all of different extent since prosodic marking is a gradual process.

In qualitative research, it is common to manually search for distinguishable $F_0$ contours in order to categorise them. Obviously, this is an effortful and time-consuming process. The aim of this project is to explore different machine learning approaches for automatic labelling of $F_0$ contours. Powerful but resource-intensive tools, such as neural networks are to be compared with approaches that require less computational power but are more limited.