Learning an Artificial F0-Contour for ALT Speech
- Mon, Oct 01, 2012
The Artiﬁcial Larynx Transducer (ALT) is a possibility to re-obtain audible speech for people who had to undergo an operation where the vocal folds are removed. For decades it is known that the resulting speech suffers from several problems such as a very poor speech quality and an unnatural sound of the speech. One reason for the lack of naturalness is the constant vibration of the ALT and a method to substantially improve ALT speech is to introduce a varying fundamental frequency (F0) - contour. In this work we present a new method to automatically learn an artificial F0-contour.
The F0-contour is estimated using a Gaussian mixture model (GMM) which describes the joint density of fundamental frequency and feature vector. To train the GMM a speech database is recorded which contains the same sentences spoken one time with the ALT and one time with healthy speech. The features (MFCCs) for the GMM are taken from the ALT speech database and the corresponding fundamental frequency values from healthy speech. Results of the 4-fold cross validation and informal listening tests demonstrate that fundamental frequency estimation based on a machine learning procedure is possible and in terms of real-time application preferable.
In the ﬁgure one voiced phrase of a male speaker is illustrated. The estimated F0 values (red cross) match well with the true F0 values from healthy speech (blue circle) and clearly differ from the constant ALT F0 values (green diamond).
More information can be found in our paper!