Signal Processing and Speech Communication Laboratory
homeevents › PhD defense Tania Habib

PhD defense Tania Habib

Start date/time
Thu Jul 28 07:30:00 2011
End date/time
Thu Jul 28 09:00:00 2011
Seminarraum D3.07 am Institut für Computer Graphik und Wissensvisualisierung, Inffeldgasse 16c/II, 8010 Graz

Chairman: Univ.-Prof.Dr. L. Fickert
Examiner: Univ.-Prof.Dr. G. Kubin
Examiner: Prof.Dr.-Ing. W. Kellermann (Universität Erlangen-Nürnberg)

Auditory Inspired Methods for Multiple Speaker Localization and Tracking Using a Circular Microphone Array
This work treats the problem of acoustic source localization and tracking using a microphone array. The use of microphone arrays offers speech enhancement in meeting rooms and office space. One solution for speech enhancement in a realistic environment with ambient noise and multi-path propagation is to perform beamforming, where a microphone array steers its beam towards the desired source and suppresses the noise coming from other directions. The beamforming algorithms require the prior knowledge about the source position. Therefore, the source localization and tracking block is an integral part of such a system. The conventional localization algorithms deteriorate in realistic multiple concurrent speaker scenarios. The use of the most basic speech-related feature known as the fundamental frequency or pitch is explored in this work for the task of concurrent speakers’ localization. A multiband position pitch algorithm, which uses an auditory pre-processing inspired from the human inner ear model is presented. The role of the gammatone filterbank modeling this phenomenon is analyzed and discussed in detail. A frequency-selective criterion is explored which is based on the study of the human neural system’s use of correlations between adjacent subband frequencies for the concurrent speakers localization, and which leads to robust location and pitch cues. Two kinds of tracking algorithms are explored for this problem. One of the methods is based on grouping of spectro-temporal regions formed by fundamental frequency cues. The other is the use of sequential Monte Carlo methods or particle filters using the location cues to track an arbitrary number of concurrent speakers. A novel particle filter-based joint position and pitch tracking algorithm is also presented in this work. Various solutions are proposed for the existing problems faced by the particle filtering based trackers, including an improvement in the likelihood model which includes the information of source activity and inactivity. All the proposed speaker localization and tracking algorithms are tested using real-world recordings made with a 24 channel uniform circular microphone array using loudspeakers and real speakers under various acoustic environments. The proposed techniques give on average 20% more accurate results than the state-of-the-art SRP-PHAT algorithm.