Automatic Speech Segmentation using Kaldi

Project Type: Student Project, Master/Diploma Thesis
Project Status: Open

Short Description:

Commonly HTK has been used for automatic speech recognition (ASR) based on hidden Markov models (HMMs) and Gaussian mixtures. Recently, deep models have become very successful also for speech recognition technology. Therefore, the toolkit “Kaldi” [1] has been introduced in the scientific community. The aim of this project is to adapt our current Kaldi-system for a speech segmentation task and to compare the quality of the created semgentations with existing HTK based segmentations and with manually created segmentations. Experiments will be performed on the GRASS database which is being developed at our department and which is also of great value for us and for other research institutes in the field of speech technology and linguistics. Thus also a high visibility of the results of this thesis can be expected.

Your Tasks:

  • Literature review on ASR technology
  • Adaptation of Kaldi system and data-import
  • Comparison with HTK based and manual segmentations
  • Error analysis: for which cases is the tool especially good or bad


The candidate(s) should have a background in Speech Communication (e.g., completed Speech Communication 2), be interested in speech processing and have excellent programming skills (e.g, Python, C++ and/or R). TEAMS are very welcome!


Martin Hagmüller and Barbara Schuppler