Project Type:
Master/Diploma Thesis
Student:
Peharz Robert
Mentor: Franz Pernkopf Michael Stark


Single channel source separation aims to recover one or several source signals from a single mixture recording. Since we deal with at least 2 interfering sources, this problem is underdetermined in any case. The human auditory system uses various heuristics to separate the timefrequency plane of a perceived auditory scene, and reorganizes the resulting parts according to likely objects. In the refiltering framework socalled spectrogram masks are used to indicate the parts of the mixture spectrogram belonging to a specific source. Resynthesis of the source wave forms is achieved by modulating the original mixture phase onto the masking signals and applying the inverse Fourier transform. The challenging part is to estimate suited masking signals for each source. The factorial max vector quantization (maxVQ) system models the source spectrograms with the output of independent vector quantizers, and estimates the most probable states for each source given the mixture data. The corresponding code words give an approximation of the source spectrograms, which can be used to estimate the masking signals. The KSVD algorithm was proposed for the design of overcomplete dictionaries for sparse coders. On the other hand, this algorithm can be seen as a generalization of kmeans, the standard training algorithm for vector quantizers. In this thesis we aim to extend the factorial maxVQ system by replacing kmeans with a more flexible and more expressive training method. We propose a new algorithm which combines KSVD with nonnegative matrix factorization (NMF), which we call NMF with L0 constraints. We develop a probabilistic framework for single channel source separation and compare our system to factorial maxVQ in systematic experiments. Finally, we apply our algorithm to realworld mixture data, recorded from various TV broadcasts. 