Wavelet Analysis For Robust Speech Processing and Applications

home › phd theses › Wavelet Analysis For Robust Speech Processing and Applications

Wavelet Analysis For Robust Speech Processing and Applications

Status

Finished

Date

2007-02-16

Student

Van Tuan Pham

Mentor

Gernot Kubin

Research Areas

Speech Communication

In this work, we study the application of wavelet analysis for robust speech processing.

Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features.

The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system.

Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive weighting. A newly proposed comparison diagnostic test and other subjective tests show improvements compared with other denoising methods.

The SWF is further optimized to enhance speech quality for robust ASR. By changing the shape of the frequency weighting and estimating perceptual noise thresholds in critical subbands, the perceptual SWF method provides almost equal performance compared with the ETSI baseline for car noise and significant improvements compared with other methods in aircraft maintenance factory conditions.