Guest Lecture: Junichi Yamagishi
- Start date/time
- Tue Nov 18 10:00:00 2014
- End date/time
- Tue Nov 18 10:00:00 2014
- Location
- IC01074 Inffeldgasse 16b, first floor
- Contact
Prof. Dr. Junichi Yamagishi from National Institute of Informatics, Japan, and The Centre for Speech Technology Research, Univeristy of Edinburgh will present his work
Deep, deep, deep architecture for speech synthesis on Tuesday, November 18th 2014, 11:00, in our seminar room IC01074, Inffeldgasse 16b, first floor.
Abstract: The current statistical parametric speech synthesis typically uses hidden Markov models (HMMs) to represent probability densities of speech trajectories given texts. A new approach that speech synthesis researchers pay strong attentions now is deep learning, that is, a deep neural network (DNN) and there are several emerging attempts based on DNNs especially for acoustic modeling and prosody modeling. In this talk, after we overview HMM speech synthesis and its benefits, we introduce our latest approach using multi-DNNs where 1) we use a deep denoising auto-encoder for nonlinear feature extraction from spectra instead of the conventional linear mel-cepstral analysis, 2) we then use a DNN to learn the relationship between input texts and the extracted features instead of decision tree-based state tying, and furthermore 3) we use another DNN to model the conditional probability of the spectral differences between natural and synthetic speech and to reconstruct the spectral fine structures.