SPSC Lab

dsc_9375paning_0.jpg

In 2000, the Signal Processing and Speech Communication Laboratory (SPSC Lab) of Graz University of Technology (TU Graz) was founded as a research and education center in nonlinear signal processing and computational intelligence, algorithm engineering, as well as circuits & systems modeling and design. It covers applications in wireless communications, speech/audio communication, and telecommunications.

If you want to learn more about Signal Processing, click: "What is Signal Processing?"

The Research of SPSC Lab addresses fundamental and applied research problems in five scientific areas:

 

Result of the Month May 2017

Previous results of the month

mask_estimation.png

Using speech masks for multi-channel speech enhancement gained attention over the last years, as it combines the benefits of digital signal processing (beamforming) and machine-learning (learn the speech mask from data). We demonstrate how a speech mask can be used to construct the Minimum Variance Distortionless response (MVDR), Generalized Sidelobe Canceler (GSC) and Generalized Eigenvalue (GEV) beamformers, and a MSE-optimal postfilter. We propose a neural network architecture that learns the speech mask from the spatial information hidden in the multi-channel input data, by using the dominant eigenvector of the Power Spectral Density (PSD) matrix of the noisy speech signal as feature vector. We use CHiME-4 audio data to train our network, which contains a single speaker engulfed in ambient noise. Depending on the speakers location and the geometry of the microphone array the eigenvectors form local clusters, whereas they are randomly distributed for the ambient noise. The neural network learns this clustering from the training data. In a second step, we use the cosine similarity between neighboring eigenvectors as feature vector, which makes our approach less dependent on the array geometry and the speaker's position. We compare our results against the most prominent model-based and data-driven approaches, using PESQ and PEASS/OPS scores. Our system yields superior results, both in terms of perceptual speech quality and speech mask prediction error.