Signal Processing and Speech Communication Laboratory
hometheses & projects › Automatic Speech Segmentation using Kaldi

Automatic Speech Segmentation using Kaldi

In work
Master/Diploma Thesis
Announcement date
06 Dec 2018
Simon Wasserfall
Research Areas

Short Description:

Commonly HTK has been used for automatic speech recognition (ASR) based on hidden Markov models (HMMs) and Gaussian mixtures. Recently, deep models have become very successful also for speech recognition technology. Therefore, the toolkit “Kaldi” [1] has been introduced in the scientific community. The aim of this project is to adapt our current Kaldi-system for a speech segmentation task and to compare the quality of the created semgentations with existing HTK based segmentations and with manually created segmentations. Experiments will be performed on the GRASS database which is being developed at our department and which is also of great value for us and for other research institutes in the field of speech technology and linguistics. Thus also a high visibility of the results of this thesis can be expected.

Your Tasks:

  • Literature review on ASR technology
  • Adaptation of Kaldi system and data-import
  • Comparison with HTK based and manual segmentations
  • Error analysis: for which cases is the tool especially good or bad


Martin Hagmüller and Barbara Schuppler