Enhancement of spectral envelope modelling in HMM-based speech synthesis
- Status
- Finished
- Type
- Master Thesis
- Announcement date
- 30 Sep 2010
- Student
- Florian Krebs
- Mentors
- Harald Romsdorfer
- Research Areas
Abstract
Hidden Markov Model (HMM)-based text-to-speech synthesis systems have grown in popularity over the last years, as they are very flexible in generating speech with various speaker characteristics, emotions and speaking styles. To create a voice, statistical parametric models are built from a training corpus. Using these models an arbitrary text is converted into a speech parameter sequence that fulfills the maximum likelihood criterion.
Due to the statistical processing certain characteristics of a parameter sequence get lost. In this thesis the influence of variance based features in the parameter generation process is investigated and a new roughness feature is proposed that describes the presence of fast variations of a cepstral time sequence. The values of both of the variance and roughness features are significantly smaller in synthetic speech than in natural speech. A method to increase the roughness of a parameter sequence is described and has been tested in a listening test. It was found that the roughness criterion reduces temporal over-smoothing, but also introduces audible discontinuities.
All the algorithms have been developed and tested using the HMM-based Speech Synthesis System (HTS) released by the Nagoya Institute of Technology.
Full Text & Additional Material
The full thesis can be downloaded here.