Unsupervised Speaker Segmentation in One-Channel Speech Data

Project Type: Master/Diploma Thesis
Student: Böhm Christoph

 

 Speaker segmentation in audio data is a common task in today's speech processing. In fact, it is more a preparatory step preceding further algorithms such as speaker diarization algorithms that need single-speaker segments to sort them by different speakers. Nevertheless, speaker segmentation is not a trivial problem especially if no information is given on the speakers. Up to now, speaker segmentation algorithms do not produce reliable results. There are different approaches that are almost exclusively based on cepstral-domain features. One of them is the so called DISTBIC algorithm which is a two step segmentation process. This process uses a metric-based distance measurement (DIST) followed by the commonly used BIC (Bayesian information criterion) to detect speaker turns. Since it provides comparatively good segmentation results, the DISTBIC algorithm is used as a reference for the work of this diploma thesis. Unlike DISTBIC, the presented approach is not based on cepstral-domain features but on a distance measurement working in the frequency-domain considering the speaker-dependent parts of the spectrum. To enhance the detection performance, the spectrum is adapted using normalising techniques. The significant improvements of the proposed algorithm's separation results compared to the results of the DISTBIC algorithm are presented with the help of the TIMIT database and indexed Westddeutscher Rundfunk broadcasting data.