Signal Processing and Speech Communication Laboratory
hometheses & projects › Comparison of Automatic and Human Speech Recognition

Comparison of Automatic and Human Speech Recognition

Status
Open
Type
Master Project
Announcement date
03 Mar 2022
Mentors
Research Areas

There are various phenomena that make Conversational Speech (CS) a challenging speaking style for Automatic Speech Recognition (ASR). That is, in free conversations, we often reduce articulatory precision or speak in dialect (“kanni net machn”), we put less effort in producing flawless sentences (“so wie das Veranstaltung da”), we make use of colloquial language (“Oida!”), produce disfluent sentences (“also wir haben wir haben eines wir haben eines gekauft ein uraltes”) and even create new words on the fly (“hindimensionieren”). As humans, we are usually still able to decode (understand) such imperfect utterances. One reason for that is that we have been learning to deal with spoken language during our lifetime which provides us with powerful models. An ASR system, in contrast, is much more limited to the (finite amount of) data it had been trained on. Another reason is, that humans can fall back on context and the history of a conversation which helps them to evaluate the plausibility of (sequences of) words in a given surrounding and thus untangle probable disambiguations.

The aim of this project is to set up and carry out a perception experiment with human participants, and to compare the experimental results with those from ASR.

Your Profile

  • interest in speech phenomena
  • good experience in Python

Your tasks

  • setup and conduct of a perception experiment
  • evaluation of experimental results
  • comparison with ASR results

Groups are welcome!

Contact:

Saskia Wepner (wepner@tugraz.at)