Comparison of Automatic and Human Speech Recognition
- Status
- In work
- Type
- Master Project
- Announcement date
- 03 Mar 2022
- Student
- Sophie Lennkh and Jana Winkler
- Mentors
- Research Areas
There are various phenomena that make Conversational Speech (CS) a challenging speaking style for Automatic Speech Recognition (ASR). That is, in free conversations, we often reduce articulatory precision or speak in dialect (“kanni net machn”), we put less effort in producing flawless sentences (“so wie das Veranstaltung da”), we make use of colloquial language (“Oida!”), produce disfluent sentences (“also wir haben wir haben eines wir haben eines gekauft ein uraltes”) and even create new words on the fly (“hindimensionieren”). As humans, we are usually still able to decode (understand) such imperfect utterances. One reason for that is that we have been learning to deal with spoken language during our lifetime which provides us with powerful models. An ASR system, in contrast, is much more limited to the (finite amount of) data it had been trained on. Another reason is, that humans can fall back on context and the history of a conversation which helps them to evaluate the plausibility of (sequences of) words in a given surrounding and thus untangle probable disambiguations.
The aim of this project is to set up and carry out a perception experiment with human participants, and to compare the experimental results with those from ASR.
Contact:
Saskia Wepner (wepner@tugraz.at)