Contribution of prosodic features to language models in automatic speech recognition systems and human perception
- In work
- Saskia Wepner
- Research Areas
State-of-the-art automatic speech recognition (ASR) systems achieve accuracies close to human capabilities for read speech. When it comes to conversational speech (CS), systems perform much worse. Reasons for that are, on the one hand, incomplete sentences, wrong grammar, slang vocabulary and a broad pronunciation variation resulting from both linguistic phenomena such as reduced pronunciation in a familiar context and dialect speaking styles. On the other hand, there is usually not enough data available to train existing systems sufficiently. Making use of linguistic knowledge about CS, such as prosodic features, shall generate a better understanding of how the conversational character of a dialogue affects pronunciation and how sentences are grammatically deformed in CS. With the focus on language models (LMs), this understanding is expected to improve current ASR systems for CS without the need of large data bases. The aim of this research is to find prosodic features that contribute to the performance of both ASR systems and human speech perception. Therefore, humans shall challenge the adapted LM(s) in perception experiments yielding further knowledge on to date human superiority in speech recognition of CS which can then again be exploited in ASR and vice versa.