A Probabilistic Model-Based Approach for Multipitch Tracking of Speech

Result of the Month


The fundamental frequency is an important characteristic of speech signals. Most energy of voiced speech utterances is carried by the harmonics, which are located at integer multiples of the fundamental frequency.
The task of multipitch tracking is to extract the fundamental frequency from a mixture of simultaneous speakers. In this work, we investigate a model based approach where speaker specific characteristics are learned beforehand. The availability of speaker dependent (SD) models allows to additionally assign a pitch estimate to its corresponding speaker.

Contact: Michael Wohlmayr

The above figure shows an example for the speech mixture of two female speakers. Panel (a): Spectrogram of speech mixture, together with reference pitch trajectories extracted from single speech recording (black and blue line). Note that the pitch trajectories of both speakers are located in the same frequency range crossing each other. In this situation, the assignment of pitch estimates to corresponding speakers based on time-continuity constraints is hard or even impossible - additional consideration of speaker specific spectral characteristics is necessary. Panel (b): Estimated pitch trajectories using speaker dependent (SD) models. The color of estimated pitch points indicates the assignment to a speaker (red x: speaker 1, blue o: speaker 2). Additionally, the reference trajectories are shown as black lines. A comparison with panel (a) reveals that the speaker assignment is correct most of the time. Panel (c): Estimated pitch trajectories using speaker independent (SI) models. A comparison with panel (a) and (b) shows that the speaker assignment is inferior to SD models.

Read more in our publication.

1. February 2012 - 29. February 2012