Guest lecture by György Szaszák
- Start date/time
- Wed Jul 11 12:30:00 2012
- End date/time
- Wed Jul 11 12:30:00 2012
- Seminar room IDEG134, Inffeldgasse 16c, ground floor
“Exploitation Possibilities of Speech Prosody in Automatic Speech Recognition and Understanding”
Speech prosody is an important clue in human speech production and perception and is therefore widely addressed in text-to-speech systems. However, in speech-to-text (speech recognition) and speech understanding applications, it is mostly neglected and untreated, although several studies argue that prosody alone or combined to existing acoustic or rather language models can help speech recognition, sentence mood or modality recognition, emotion (sentiment) analysis, disambiguation, syntactic analysis, etc. Some information transmitted via human speech is referred to only by prosody, while prosody provides redundancy for another part of this information. Both are worth exploiting. This talk will give an overview of the exploitation possibilities of prosody in speech recognition and understanding related tasks, and also present a Hidden Markov model based approach to assess the prosodic structure of spoken utterances. Supra-segmental signal processing and some modelling issues will also be covered. The usability of this approach for the automatic exploration of the syntactic structure is also addressed. Some more details on the approach and its application and a bird’s eye view on the obtained results for several tasks in which prosody is involved will hopefully contribute to make this talk interesting for anyone dealing with speech technology and related research.
György Szaszák was born in Budapest, Hungary in 1979. He graduated as an M.Sc. electrical engineer at the Budapest University of Technology and Economics, at the Dept. of Telecommunications and Media Informatics in 2002, specialized in speech technology (automatic speech recognition). Form the same year he has been a research assistant and later a research fellow at the Laboratory of Speech Acoustic at the Dept. of Telecommunications and Media Informatics. He obtained his PhD in 2009 field of technical sciences (information sciences) with his dissertation titled “The Role and Usage of Supra-segmental Features in Automatic Speech Recognition”. His main research topics cover automatic speech recognition and understanding, speech signal processing, speech acoustics, speech databases, speech prosody, prosody-syntax interface. Author of about 40 publications, he has also been taking part in the education of several speech technology and acoustics related courses at the Budapest University of Technology and Economics.