Signal Processing and Speech Communication Laboratory
hometheses & projects › Enhancement of spectral envelope modelling in HMM-based speech synthesis

Enhancement of spectral envelope modelling in HMM-based speech synthesis

Master/Diploma Thesis
Announcement date
30 Sep 2010
Florian Krebs
  • Harald Romsdorfer
Research Areas


Hidden Markov Model (HMM)-based text-to-speech synthesis systems have grown in popularity over the last years, as they are very flexible in generating speech with various speaker characteristics, emotions and speaking styles. To create a voice, statistical parametric models are built from a training corpus. Using these models an arbitrary text is converted into a speech parameter sequence that fulfills the maximum likelihood criterion.

Due to the statistical processing certain characteristics of a parameter sequence get lost. In this thesis the influence of variance based features in the parameter generation process is investigated and a new roughness feature is proposed that describes the presence of fast variations of a cepstral time sequence. The values of both of the variance and roughness features are significantly smaller in synthetic speech than in natural speech. A method to increase the roughness of a parameter sequence is described and has been tested in a listening test. It was found that the roughness criterion reduces temporal over-smoothing, but also introduces audible discontinuities.

All the algorithms have been developed and tested using the HMM-based Speech Synthesis System (HTS) released by the Nagoya Institute of Technology.

Full Text & Additional Material

The full thesis can be downloaded here.