Contribution of prosodic information in ASR systems for read and spontaneous speech by integrating long-term and short-term acoustic features
- In work
- Julian Linke
- Research Areas
State-of-the-art ASR systems perform well on read and conversational speech (see modern virtual assistants like Alexa or Siri) but yet the recognition of spontaneous speech is associated with many difficulties which potentially could benefit from new ideas for the speech recognition task. This thesis presents speech recognition experiments which incorporate prosodic information to improve ASR systems for read and - spontaneous - conversational speech. Specifically, this approach is suitable for languages with lower available resources. One of the main reasons for the arising difficulties are the many pronunciation variants which need to be understood and learned when developing modern ASR systems. On this account the main focus lies on the improvement of the acoustic model (which represents one main part of a modern ASR system) by integrating, for example, different long-term and short-term acoustic features. Consequently, the trade-off between knowledge-based and data-driven approaches is levered out by illustrating - and contrasting - the advantages of prosodic information included in the modeling process of ASR systems.