Wavelet Based Speaker Change Detection in Single Channel Speech Data

Project Type: Master/Diploma Thesis
Student: Wiesenegger Michael


 Speaker segmentation is the task of finding speaker changes in an audio stream which is important for speech processing like audio diarization, audio indexing or speaker recognition. It is not easy to find speaker turns in single channel audio, especially when there is no a priori knowledge of the speakers available. This thesis treats the search and retrieval of such speaker turns. A lot of algorithms nowadays extract features in the cepstral domain, e.g., Mel Frequency Cepstral Coefficients to find speaker turns in an audio stream. So does the reference algorithm for this thesis, the so called DISTBIC algorithm, which uses a two pass approach to detect speaker turns. The first pass computes a Kullback Leibler distance between the extracted features and at the second pass, the detected speaker changes are validated with the Bayesian Information Criterion. In this thesis, we do not use cepstral domain features, we take discrete wavelet features, because of the good time and frequency resolution of the wavelet transform and we restrict ourselves to a one pass approach. We develop three different sets of wavelet features. Two of them combine wavelet features with statistical methods for dimensionality reduction. These statistical methods are the principal component analysis and the linear discriminant analysis. For the third wavelet feature set the subband-energy of the wavelet coefficients is computed. The proposed approaches are compared with the DISTBIC using clean and noisy data of the TIMIT database. Especially, under conditions with strong noise, i.e., -10 db SNR, our wavelet based approaches are very robust, where the DISTBIC fails.