Guest lecture by György Szaszák

"Exploitation Possibilities of Speech Prosody in Automatic Speech Recognition and Understanding"



Speech prosody is an important clue in human speech production and perception and is therefore widely addressed in text-to-speech systems.  However, in speech-to-text (speech recognition) and speech understanding applications, it is mostly neglected and untreated, although several  studies argue that prosody alone or combined to existing acoustic or rather language models can help speech recognition, sentence mood or  modality recognition, emotion (sentiment) analysis, disambiguation, syntactic analysis, etc. Some information transmitted via human speech is  referred to only by prosody, while prosody provides redundancy for another part of this information. Both are worth exploiting. This talk will give an  overview of the exploitation possibilities of prosody in speech recognition and understanding related tasks, and also present a Hidden Markov  model based approach to assess the prosodic structure of spoken utterances. Supra-segmental signal processing and some modelling issues will  also be covered. The usability of this approach for the automatic exploration of the syntactic structure is also addressed. Some more details on the  approach and its application and a bird’s eye view on the obtained results for several tasks in which prosody is involved will hopefully contribute to  make this talk interesting for anyone dealing with speech technology and related research.


György Szaszák was born in Budapest, Hungary in 1979. He graduated as an M.Sc. electrical engineer at the Budapest University of Technology and Economics, at the Dept. of Telecommunications and Media Informatics in 2002, specialized in speech technology (automatic speech recognition). Form the same year he has been a research assistant and later a research fellow at the Laboratory of Speech Acoustic at the Dept. of Telecommunications and Media Informatics. He obtained his PhD in 2009 field of technical sciences (information sciences) with his dissertation titled “The Role and Usage of Supra-segmental Features in Automatic Speech Recognition”. His main research topics cover automatic speech recognition and understanding, speech signal processing, speech acoustics, speech databases, speech prosody, prosody-syntax interface. Author of about 40 publications, he has also been taking part in the education of several speech technology and acoustics related courses at the Budapest University of Technology and Economics.

Date with Time
11. July 2012 - 14:30
Seminar room IDEG134, Inffeldgasse 16c, ground floor