Learning an Artificial F0-Contour for ALT Speech

home ›results of the month › Learning an Artificial F0-Contour for ALT Speech

Learning an Artificial F0-Contour for ALT Speech

Published

Mon, Oct 01, 2012

Tags

rotm

Contact

Anna Katharina Fuchs

The Artiﬁcial Larynx Transducer (ALT) is a possibility to re-obtain audible speech for people who had to undergo an operation where the vocal folds are removed. For decades it is known that the resulting speech suffers from several problems such as a very poor speech quality and an unnatural sound of the speech. One reason for the lack of naturalness is the constant vibration of the ALT and a method to substantially improve ALT speech is to introduce a varying fundamental frequency (F₀) - contour. In this work we present a new method to automatically learn an artificial F₀-contour.

The F₀-contour is estimated using a Gaussian mixture model (GMM) which describes the joint density of fundamental frequency and feature vector. To train the GMM a speech database is recorded which contains the same sentences spoken one time with the ALT and one time with healthy speech. The features (MFCCs) for the GMM are taken from the ALT speech database and the corresponding fundamental frequency values from healthy speech. Results of the 4-fold cross validation and informal listening tests demonstrate that fundamental frequency estimation based on a machine learning procedure is possible and in terms of real-time application preferable.

In the ﬁgure one voiced phrase of a male speaker is illustrated. The estimated F₀ values (red cross) match well with the true F₀ values from healthy speech (blue circle) and clearly differ from the constant ALT F₀ values (green diamond).

More information can be found in our paper!

Browse the Results of the Month archive.