Signal Processing and Speech Communication Laboratory
hometheses & projects › Audio-visual speech enhancement for the improvement of pathological speech

Audio-visual speech enhancement for the improvement of pathological speech

Status
Open
Type
Master Thesis
Announcement date
31 Oct 2024
Mentors
Research Areas

Abstract

Research on speech enhancement has a substantial history in the field of speech processing, but recent advances in AI have once again accelerated the progress being made. Particularly multi-modal AI is currently becoming more and more mature, further increasing observed performance of audio-visual speech enhancement (AVSE). One of the most promising use cases of ASVE is the improvement of pathological speech, since its audio alone often lacks important information that may be obtained from videos showing speakers’ body movement (lips, face, head, torso, and arms). Such videos contain information to recover expressivity via predicting prosody from facial expression, lip and head movement, and natural gestures. The aim here is to provide high-quality substitution speech for speaking impaired individuals.

Your Tasks

  • obtain audio and video field recordings of pathological speakers
  • convert impaired speech to substitution speech using state-of-the-art audio-visual speech enhancement models
  • design and conduct a listening experiment investigating speech intelligibility, as well as speaker and listener preferences
  • documentation of the work (thesis writing, optional: paper writing)

Your Profile

  • interest in speech science and technology
  • interest in health applications
  • good knowlegde in relevant Python frameworks for speech synthesis, voice conversion, and/or representation learning
  • good communicatory skills

Contact

Philipp Aichinger (philipp.aichinger@meduniwien.ac.at) Martin Hagmüller (hagmueller@tugraz.at)