How fillers affect human transcription accuracy of disfluent utterances in conversational speech
- Status
- In work
- Type
- Master Project
- Announcement date
- 03 Mar 2024
- Student
- Lucas Eckert
- Mentors
- Research Areas
There are various phenomena that make Conversational Speech (CS) a challenging speaking style for Automatic Speech Recognition (ASR). That is, in free conversations, we often reduce articulatory precision or speak in dialect (“kanni net machn”), we put less effort in producing flawless sentences (“so wie das Veranstaltung da”), we make use of colloquial language (“Oida!”), produce disfluent sentences (“also wir haben wir haben eines wir haben eines gekauft ein uraltes”) and even create new words on the fly (“hindimensionieren”). As humans, we are usually still able to decode (understand) such imperfect utterances. One reason for that is that we have been learning to deal with spoken language during our lifetime which provides us with powerful models. An ASR system, in contrast, is much more limited to the (finite amount of) data it had been trained on. Another reason is, that humans can fall back on context and the history of a conversation which helps them to evaluate the plausibility of (sequences of) words in a given surrounding and thus untangle probable disambiguations.
The specific phenomenon to be investigated here is the effect of fillers in disfluent structures on transcription accuracy. The literature provides somehow contratictionary evidence, as some studies found fillers to be important cues for parsing utterances and that their useage reduces the cognitive load of the listeners, but others find that high filler-rates correlate with lower task performance (memorizing tasks). The aim of this project is to investigate whether the insertion of fillers into disfluent utterances (“also wir haben ahm wir haben eines ahm wir haben eines gekauft ein uraltes”) rather helps or hinders transcription accuracy in humans and ASR systems, and whether previous and following context information may be helpful or not in improving transcription accuracy.
For this purpose, the tasks of this project are to select stimuli from a conversational speech corpus, to manipulate the stimuli to create the same utterances with and without fillers, and to set up and carry out a transcription experiment with human participants.
Contact:
Saskia Wepner (wepner@tugraz.at) Barbara Schuppler (b.schuppler@tugraz.at)