Guest Lecture: Tomoki Toda

Tomoki Toda from Augmented Human Communication Laboratory, Nara Institute of Science and Technology Japan, will give a guest lecture with the title

“Augmented speech production based on real-time statistical voice conversion"

on Thursday, August 13, 14:15, in our Seminarroom IDEG134, Inffeldgasse 16c/EG.


In our human-to-human speech communication, there are several barriers, such as physical constraints causing vocal disorders and environmental constraints that do not allow for producing and conveying intelligible speech. These barriers would be overcome if our speech production was augmented so that we could produce speech sounds as we want beyond various constraints. Voice conversion is a technique for modifying speech acoustics, converting nonlinguistic information to any form we want while preserving the linguistic content. One of the most popular approaches to voice conversion is based on statistical processing, which is capable of extracting complex conversion functions from a parallel speech data set consisting of utterance pairs of the source and the target voices. Although this technique was originally studied in the context of speaker conversion, which converts the voice of a certain speaker (the source speaker) to sound like that of another speaker (the target speaker), it has great potential to achieve various applications beyond speaker conversion. This talk will briefly review a state-of- the-art trajectory-based conversion method that is capable of using statistics calculated over an utterance to effectively reproduce natural speech parameter trajectories, and will highlight a technique that extends this trajectory-based conversion method to achieve a lower conversion delay. Finally this talk will show some examples of real-time applications of voice conversion to enhance our human-to-human speech communication beyond several constraints, such as speaking-aids for total laryngectomees and body-conducted speech enhancement.


He earned his B.E. degree from Nagoya University, Aichi, Japan, in 1999 and his M.E. and D.E. degrees from the Graduate School of Information Science, NAIST, Nara, Japan, in 2001 and 2003, respectively. He was a Research Fellow of JSPS from 2003 to 2005. He was an Assistant Professor of the Graduate School of Information Science, NAIST from 2005 to 2011, where he is currently an Associate Professor. He was a Visiting Researcher at the Language Technologies Institute, CMU, Pittsburgh, USA, from October 2003 to September 2004 and at the Department of Engineering, University of Cambridge, Cambridge, UK, from March to August 2008. His research interests include statistical approaches to speech processing such as voice transformation, speech synthesis, speech analysis, speech production, and speech recognition. He received more than 10 paper and achievement awards including the 2009 Young Author Best Paper Award from the IEEE SPS and the 2013 Best Paper Award (Speech Communication Journal) from EURASIP-ISCA. He is a member of the Speech and Language Technical Committee of the IEEE SPS.

