Signal Processing and Speech Communication Laboratory
hometheses & projects › Channel Selection for Distant Automatic Speech Recognition on the CHiME-5 dataset

Channel Selection for Distant Automatic Speech Recognition on the CHiME-5 dataset

Master Thesis
Announcement date
01 Oct 2018
Hannes Unterholzner
Research Areas


Current automatic speech recognition systems already show remarkable results in constrained scenarios with close-talk recordings. However, the performance is affected when recordings are taken from the far-field due to both noise and reverberations. In the presence of multiple distant-talking microphones we can assume that some decoded channels deliver better transcriptions than others. The objective of this thesis is to investigate a DNN-based classifier for channel selection trained on signal-based and/or decoder-based features. The CHiME-5 dataset, a novel dataset for distant multiple-microphone conversational speech recognition, is used to conduct the experiments. A promising performance gain of 18% is provided from an oracle analysis. Actual experimental results reveal the limitation of the extracted features and DNN classifiers to correlate well with the oracle results, i.e. the classification results and the DNNs generalisation performance is weak. The problem is traced back to a high proportion of simultaneous and spontaneous speech, different acoustic scenarios and background noises, as well as the variation in speakers among the sessions of the dataset. Moreover, based on the obtain classifier rankings we apply hypothesis combination with ROVER on different channel subsets for error reduction, based on average confidence scores.