Probabilistic Model-Based Multiple Pitch Tracking of Speech
The fundamental frequency is an important characteristic of speech signals. Most energy in voiced speech utterances is carried by the harmonics, which are located at integer multiples of the fundamental frequency. Estimation and tracking of pitch is an important and ongoing research area in speech and audio over the last decades. Several well performing methods do exist for the case of a single speaker in well controlled environments (i.e. no back ground noise, etc.). For the case of several speakers talking simultaneuosly, however, estimation of (multiple) pitch values from monaural recordings is a more challenging scenario.
In the literature, several approaches have been proposed for multipitch tracking. However, they do not consider the problem of assigning the estimated pitch values to their corresponding speaker. This additional information is an important cue for monaural speech separation, where single spectro-temporal units of the speech mixture need to be linked to the correct speaker (‘binary mask’). In this thesis, a statistical approach for multipitch tracking is investigated. The assignment problem is tackled by incorporating probabilistic speaker models, which in turn are to be estimated from the available speech mixture. FooBar123