Overcoming Covariance Matrix Phase Sensitivity in Single-Channel Speech Enhancement with Correlated Spectral Components

Result of the Month


Single-channel speech enhancement refers to the reduction of noise signal components in a single-channel signal composed of both speech and noise. A wide range of single-channel speech enhancement algorithms is formulated in the short-time Fourier transform (STFT). Traditional approaches assume statistical independence between signal components from different time-frequency regions, resulting in estimators that are functions of diagonal covariance matrices. More recent approaches drop this assumption and explicitly model dependencies between STFT bins. Full covariance matrices of speech and noise are required in this case to obtain optimal estimates of the clean speech spectrum, where off-diagonal entries are complex-valued in general. We show that the performance of estimators resulting from such models is highly sensitive to the phase estimation accuracy of these off-diagonal entries. Since it is non-trivial to estimate the covariance phases from noisy speech data, we propose a linear multidimensional short-time spectral amplitude estimator that circumvents the need to estimate them. We evaluate the speech enhancement performance of this approach and compare it to relevant benchmarks that also take into account inter-channel dependencies.

Contact: Johannes Stahl

The figure illustrates the effect of the proposed speech enhancement method by means of spectrograms. (a) is the clean, undisturbed speech signal, (b) is the noisy signal, and (c) is the enhanced signal.

More information can be found in our paper.

1. January 2019 - 31. January 2019