Wavelet Analysis For Robust Speech Processing and Applications

PhD Student 
Pham Van Tuan
Research Area


 In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features. The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system. Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive weighting. A newly proposed comparison diagnostic test and other subjective tests show improvements compared with other denoising methods. The SWF is further optimized to enhance speech quality for robust ASR. By changing the shape of the frequency weighting and estimating perceptual noise thresholds in critical subbands, the perceptual SWF method provides almost equal performance compared with the ETSI baseline for car noise and significant improvements compared with other methods in aircraft maintenance factory con  


This thesis is supervised by Gernot Kubin.