New Strategies for Single-channel Speech Separation: Ph.D. Thesis

TitleNew Strategies for Single-channel Speech Separation: Ph.D. Thesis
Publication TypeBooklet
AuthorsMowlaee, P.
Secondary TitleDepartment of Electronic Systems, Aalborg University
ISBN Number978-87-92328-53-3
PublisherInstitut for Elektroniske Systemer, Aalborg Universitet
Place PublishedAalborg, Denmark
Year of Publication2010
Date PublishedDecember

In many speech applications, the signal of interest is often corrupted by highly correlated noise sources. An extreme example is when several speakers are talking at the same time, a phenomenon called cock-tail party problem. Separating desired speaker signals from their mixture is one of the most challenging research topics in speech signal processing. The problem is called single-channel speech separation (SCSS) where the interfering signal is another speaker. Possible applications include speech coding, speech recognition, hearing aid and forensics where a high quality separation algorithm is required as a pre-processing stage to mitigate the effect of interfering signals. In the introductory part of this thesis, we present the problem definition and give an overview of its different applications in real life. We then move on presenting the dominating previous SCSS methods and outline the problems they face. As our contribution, we present novel strategies to improve the separation performance in the form of proposing two SCSS systems, namely model-driven SCSS in sinusoidal domain and joint speech separation and speaker identification. We propose sinusoidal mixture estimator for speech separation. We generalize mask methods for speech separation from short-time Fourier transform to sinusoidal case. Experiments show that using sinusoidal masks improved the separation performance compared to the STFT counterpart. A separation system is proposed based on sinusoidal parameters composed of sinusoidal mixture estimator along with sinusoidal coders used as speaker models. To overcome the speaker dependency problem known as a common problem in model-driven SCSS methods, we present a joint closed loop speaker identification and speech separation considered as an attractive approach for speaker-independent SCSS. We also propose two contributions to identify speakers from single-channel speech mixture. We propose a new approach for speaker identification for single-channel speech mixture independent of the signal-to-signal ratio. We present a double-talk detection method to determine the single-talk/double-talk regions in a mixture. We also integrate a double-talk detector with a speaker identification module to improve the speaker identification accuracy. Finally, a joint speech separation and speaker identification system is proposed for separation challenge.

Citation Keymowlaee_phdthesis
Full Text

 For some Audio demoes click here.