Guest Lecture: Junichi Yamagishi

Prof. Dr. Junichi Yamagishi from National Institute of Informatics, Japan, and The Centre for Speech Technology Research, Univeristy of Edinburgh will present his work 

Deep, deep, deep architecture for speech synthesis
on Tuesday, November 18th 2014, 11:00, in our seminar room IC01074, Inffeldgasse 16b, first floor.
The current statistical parametric speech synthesis typically uses hidden Markov models (HMMs) to represent probability densities of speech trajectories given texts. A new approach that speech synthesis researchers pay strong attentions now is deep learning, that is, a deep neural network (DNN) and there are several emerging attempts based on DNNs especially for acoustic modeling and prosody modeling.
In this talk, after we overview HMM speech synthesis and its benefits, we introduce our latest approach using multi-DNNs where 1) we use a deep denoising auto-encoder for nonlinear feature extraction from spectra instead of the conventional linear mel-cepstral analysis, 2) we then use a DNN to learn the relationship between input texts and the extracted features instead of decision tree-based state tying, and furthermore 3) we use another DNN to model the conditional probability of the spectral differences between natural and synthetic speech and to reconstruct the spectral fine structures.


Date with Time
18. November 2014 - 11:00
IC01074 Inffeldgasse 16b, first floor