PhD defense Anna Katharina Fuchs

Univ.-Prof. Dr. L. FICKERT
Univ.-Prof.Dr. G. KUBIN
Assoc.Prof.Dr. T. TODA (Nara Institute of Science and Technology, Japan)


The Bionic Electro-Larynx Speech System

Challenges, Investigations, and Solutions

Humans without larynx need to use a substitution voice to re-obtain speech. The electro-larynx (EL) is a widely used device but is known for its unnatural and monotonic speech quality. Previous research tackled these problems, but until now no significant improvements could be reported. The EL speech system is a complex system including hardware (artificial excitation source or sound transducer) and software (control and generation of the artificial excitation signal). It is not enough to consider one separated problem, but all aspects of the EL speech system need to be taken into account. In this thesis we would like to push forward the boundaries of the conventional EL device towards a new bionic electro-larynx speech system.

We formulate two overall scenarios: a closed-loop scenario, where EL speech is excited and simultaneously recorded using an EL speech system, and the artificial excitation signal is controlled based on the preceding recordings and sent back to the EL speech system to excite the vocal tract; and an open-loop scenario, where signal processing algorithms are used to enhance and improve recorded EL speech, and a loudspeaker is used for playback. Although we emphasize the first scenario, because it is closer to natural speech production, the latter is capable of significant improvements in terms of naturalness and can be used in telecommunication applications.

We record a German parallel electro-larynx speech -- healthy speech database in order to carry out our experiments. Moreover, we provide algorithms for signal-to-noise ratio calculations and analyses of the data. We propose an algorithm to estimate a changing fundamental frequency from the speech spectral envelope. Listening tests show that a changing fundamental frequency improves the perceived naturalness of EL speech. Moreover, our proposed estimation algorithm increases the naturalness significantly compared to constant and random fundamental frequency contours. Furthermore, we study electromyographic (EMG) signals to analyze their suitability for on/off control of the EL speech system and investigate learning effects of naive users. Listening tests show that, after training, EMG controlled EL speech is significantly more pleasant to listen to than before training. We propose a new transducer for the EL speech system based on electro-magnetic mechanisms. The technical properties of the new transducer show significant advantages over the conventional electro-dynamic transducer. We design a housing for the transducer, and suggestions for an optimal coupler disk and the waveform of the excitation source are given. Listening tests serve as a proof of concept for the resulting EL speech system which means that the proposed system turns out to be promising. 

For the open-loop scenario we perform statistical voice conversion (SVC) which leads to improvements in terms of naturalness, but intelligibility suffers. SVC is very promising to improve EL speech, but more investigations will need to be carried out.



Date with Time
14. August 2015 - 11:00
Seminar room IDEG 134 (Inffeldgasse 16c)