A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement

TitleA Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement
Publication TypeJournal Article
Year of Publication2018
AuthorsStahl, J., & Mowlaee P.
JournalIEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume26
Number2
Pages436-450
MonthFebruary
Abstract

Speech enhancement methods formulated in the short-time Fourier transform (STFT) domain vary in the statistical assumptions made on the STFT coefficients, in the optimization criteria applied or in the models of the signal components. Recently, approaches relying on a stochastic-deterministic speech model have been proposed. The deterministic part of the signal corresponds to harmonically related sinusoids, often used to represent voiced speech. The stochastic part models signal components that are not captured by the deterministic components. In this paper, we consider this scenario under a new perspective yielding three main contributions. First, a pitch-synchronous signal representation is considered and shown to be advantageous for the estimation of the harmonic model parameters. Second, we model the harmonic amplitudes in voiced speech as random variables with frequency bin dependent Gamma distributions. Finally, distinct estimators for the different models of voiced speech, unvoiced speech, and speech absence are derived. To select from the arising estimates, we take into account the mutual impact of detection and estimation by proposing a binary decision framework that is derived from a Bayesian risk function. The resulting pitch-synchronous stochastic-deterministic estimator outperforms several benchmark methods in terms of speech intelligibility and perceived quality predicted by instrumental measures for various noise types and different signal-to-noise ratios.

 

URLhttps://ieeexplore.ieee.org/document/8125741/
DOI 10.1109/TASLP.2017.2779405
Citation Key3684