Single Channel Source Separation using Dictionary Design Methods for Sparse Coders

Project Type: Master/Diploma Thesis
Student: Peharz Robert


 Single channel source separation aims to recover one or several source signals from a single mixture recording. Since we deal with at least 2 interfering sources, this problem is under-determined in any case. The human auditory system uses various heuristics to separate the time-frequency plane of a perceived auditory scene, and reorganizes the resulting parts according to likely objects. In the refiltering framework so-called spectrogram masks are used to indicate the parts of the mixture spectrogram belonging to a specific source. Resynthesis of the source wave forms is achieved by modulating the original mixture phase onto the masking signals and applying the inverse Fourier transform. The challenging part is to estimate suited masking signals for each source. The factorial max vector quantization (max-VQ) system models the source spectrograms with the output of independent vector quantizers, and estimates the most probable states for each source given the mixture data. The corresponding code words give an approximation of the source spectrograms, which can be used to estimate the masking signals. The K-SVD algorithm was proposed for the design of overcomplete dictionaries for sparse coders. On the other hand, this algorithm can be seen as a generalization of k-means, the standard training algorithm for vector quantizers. In this thesis we aim to extend the factorial max-VQ system by replacing k-means with a more flexible and more expressive training method. We propose a new algorithm which combines K-SVD with nonnegative matrix factorization (NMF), which we call NMF with L0 constraints. We develop a probabilistic framework for single channel source separation and compare our system to factorial max-VQ in systematic experiments. Finally, we apply our algorithm to real-world mixture data, recorded from various TV broadcasts.