Multiresolution Spectrotemporal Features for Speech Nonspeech Discrimination

Project Type: Master/Diploma Thesis
Student: Wohlmayr Michael
Mentor: Gernot Kubin


 Auditory models provide a way for decomposition of speech and audio signals into elementary patterns which seem to be highly relevant for human perception. Based on neurophysiological findings, Chi, Ru and Shamma developed a computational model that maps a signal to a 4-dimensional representation of its spectro-temporal modulations. These advanced time-frequency descriptions of a signal may be relevant for a huge class of signal-processing tasks, e.g. classification, estimation and detection. The aim of this master thesis is to explore the performance of these features on a specific signal processing task, namely speech nonspeech discrimination. Using the auditory model as a feature extraction method, it is essential to apply further methods for dimensionality reduction. We explore two methods for this purpose, namely (1) Multilinear Singular Value Decomposition which has been proposed for the same task earlier, and (2) the Information Bottleneck Method whose application is novel in this framework. Based on this method, we develop a scheme for compressing speech nonspeech relevant information into a single number. Based on MATLAB simulations, we assess and compare the performance of the resulting classification systems on two databases at different SNR conditions and show the robustness of auditory features in comparison to a MFCC and ZCR features based system.