Contributions to Single-Channel Speech Enhancement with a Focus on the Spectral Phase

home › phd theses › Contributions to Single-Channel Speech Enhancement with a Focus on the Spectral Phase

Contributions to Single-Channel Speech Enhancement with a Focus on the Spectral Phase

Status

Finished

Date

2019-01-17

Student

Johannes Stahl

Mentor

Pejman Mowlaee Beikzadehmahaleh

Research Areas

Speech Communication

Single-channel speech enhancement refers to the reduction of noise signal components in a single-channel signal composed of both speech and noise. Spectral speech enhancement methods are among the most popular approaches to solving this problem. Since the short-time spectral amplitude has been identified as a highly perceptually relevant quantity, most conventional approaches rely on processing the amplitude spectrum only, ignoring any information that may be contained in the spectral phase. As a consequence, the noisy short-time spectral phase is neither enhanced for the purpose of signal reconstruction nor is it used for refining short-time spectral amplitude estimates. This thesis investigates the use of the spectral phase and its structure in algorithms for single-channel speech enhancement. This includes the analysis of the spectral phase in the context of theoretically optimal speech estimators. The resulting knowledge is exploited in formulating single-channel speech enhancement algorithms. On the one hand, the developed algorithms process the noisy spectral magnitude using spectral phase information and also modify the noisy spectral phase itself. On the other hand, the findings about the spectral phase also result in the conclusion that in certain cases, phase-aware processing should be deliberately circumvented. Besides objective evaluation of the algorithms presented in this thesis, two subjective listening tests have been conducted in order to evaluate the perceptual relevance of the proposals. The results show that the proposed algorithms consistently improve the perceived speech quality of noisy speech signals while improving the speech intelligibility compared to their conventional counterparts.