Signal Processing and Speech Communication Laboratory
hometheses & projects › Automatic Speech Segmentation using Kaldi

Automatic Speech Segmentation using Kaldi

Master Thesis
Announcement date
06 Dec 2018
Simon Wasserfall
Research Areas


Automatic speech segmentation is an often used method to annotate large speech corpora. It can serve as a starting point for corpus-based linguistic studies. In contrast to segmenting read speech, the segmentation of spontaneous, conversational speech is a more challenging task. Spontaneously pronounced words contain phenomena of reduction, assimilation and deletion and the task is therefore more complex than read speech. In this thesis, automatic speech segmentation is performed for the GRASS corpus, which contains both read and conversational speech data of Austrian German. The approach chosen for the segmentation is a forced alignment with the state of the art toolkit Kaldi. In addition to studying the impact of different frame-shifts during the acoustic modelling, also pronunciation modelling for Austrian German is a focus in this thesis. Pronunciation variation is modelled with a knowledge-based approach with the help of formalised phonological rules, resulting in a pronunciation lexicon. The results of a quantitative distance measure to reference alignments for the GRASS read speech component with 8.4%, is similar to previously reported values for the same speaking style. The analysis of the two speech style showed that the mean speechrate of conversational speech is more than twice as large as the mean speechrate of read speech.