Probabilistic Model-Based Multiple Pitch Tracking of Speech

PhD Student 
Research Area

The fundamental frequency is an important characteristic of speech signals. Most energy in voiced speech utterances is carried by the harmonics, which are located at integer multiples of the fundamental frequency. Estimation and tracking of pitch is an important and ongoing research area in speech and audio over the last decades. Several well performing methods do exist for the case of a single speaker in well controlled environments (i.e. no back ground noise, etc.). For the case of several speakers talking simultaneuosly, however, estimation of (multiple) pitch values from monaural recordings is a more challenging scenario.

In the literature, several approaches have been proposed for multipitch tracking. However, they do not consider the problem of assigning the estimated pitch values to their corresponding speaker. This additional information is an important cue for monaural speech separation, where single spectro-temporal units of the speech mixture need to be linked to the correct speaker ('binary mask'). In this thesis, a statistical approach for multipitch tracking is investigated. The assignment problem is tackled by incorporating probabilistic speaker models, which in turn are to be estimated from the available speech mixture. FooBar123

Top panel: Log-Spectrogram of two speakers together with reference pitch trajectories extracted from single speech. Bottom panel: Estimated pitch trajectories, together with reference. Estimation is performed with speaker dependent models. Red and blue points are respectively assigned to speaker 1 and 2.
This thesis is supervised by Franz Pernkopf.