Enhancement of spectral envelope modelling in HMM-based speech synthesis

home › theses & projects › Enhancement of spectral envelope modelling in HMM-based speech synthesis

Enhancement of spectral envelope modelling in HMM-based speech synthesis

Status

Finished

Type

Master Thesis

Announcement date

30 Sep 2010

Student

Florian Krebs

Mentors

Harald Romsdorfer

Research Areas

Speech Communication

Abstract

Hidden Markov Model (HMM)-based text-to-speech synthesis systems have grown in popularity over the last years, as they are very ﬂexible in generating speech with various speaker characteristics, emotions and speaking styles. To create a voice, statistical parametric models are built from a training corpus. Using these models an arbitrary text is converted into a speech parameter sequence that fulﬁlls the maximum likelihood criterion.

Due to the statistical processing certain characteristics of a parameter sequence get lost. In this thesis the inﬂuence of variance based features in the parameter generation process is investigated and a new roughness feature is proposed that describes the presence of fast variations of a cepstral time sequence. The values of both of the variance and roughness features are signiﬁcantly smaller in synthetic speech than in natural speech. A method to increase the roughness of a parameter sequence is described and has been tested in a listening test. It was found that the roughness criterion reduces temporal over-smoothing, but also introduces audible discontinuities.

All the algorithms have been developed and tested using the HMM-based Speech Synthesis System (HTS) released by the Nagoya Institute of Technology.

Full Text & Additional Material

The full thesis can be downloaded here.