A German distant speech recognizer based on 3D beamforming and harmonic missing data mask

TitleA German distant speech recognizer based on 3D beamforming and harmonic missing data mask
Publication TypeConference Paper
Year of Publication2013
AuthorsMorales-Cordovilla, J. A., Pessentheiner H., Hagmüller M., Mowlaee P., Pernkopf F., & Kubin G.
Conference NameAIA-DAGA
Date Published2013

This paper addresses the problem of distant speech recognition in reverberant noise conditions applying a star-shaped microphone array and missing data techniques. The performance of the system is evaluated over a German database, which has been contaminated with noise of an apartment of the DIRHA (Distant Speech Interaction for Robust Home Applications) project. The proposed system is composed of three blocks. First, a beamformer yields an enhanced single-channel signal by filtering multi-channel signals and summing up all signals afterwards. To optimize the filter weights, we apply convex (CVX) optimization over three spatial dimensions given the spatiotemporal position of the target speaker as prior knowledge. Second, the beamformer output is exploited to extract pitch and estimate the stationary part of the background noise. Third, the system produces a final noise estimate by combining both, the stationary noise part as well as the harmonic noise estimate obtained from the pitch. Finally, the filter-bank representation of the enhanced signal and its corresponding missing data mask obtained from this final noise estimate are sent to the speech recognition back-end. The purpose of this paper is to analyze the impact of employing a beamformer followed by a missing data technique.

Citation Key2689
SPSC cross-references
Research Area: