Audio-visual speech enhancement for the improvement of pathological speech

home › theses & projects › Audio-visual speech enhancement for the improvement of pathological speech

Audio-visual speech enhancement for the improvement of pathological speech

Status

Open

Type

Master Thesis

Announcement date

31 Oct 2024

Mentors

Martin Hagmüller

Research Areas

Speech Communication

Abstract

Research on speech enhancement has a substantial history in the field of speech processing, but recent advances in AI have once again accelerated the progress being made. Particularly multi-modal AI is currently becoming more and more mature, further increasing observed performance of audio-visual speech enhancement (AVSE). One of the most promising use cases of ASVE is the improvement of pathological speech, since its audio alone often lacks important information that may be obtained from videos showing speakers’ body movement (lips, face, head, torso, and arms). Such videos contain information to recover expressivity via predicting prosody from facial expression, lip and head movement, and natural gestures. The aim here is to provide high-quality substitution speech for speaking impaired individuals.

Your Tasks

obtain audio and video field recordings of pathological speakers
convert impaired speech to substitution speech using state-of-the-art audio-visual speech enhancement models
design and conduct a listening experiment investigating speech intelligibility, as well as speaker and listener preferences
documentation of the work (thesis writing, optional: paper writing)

Your Profile

interest in speech science and technology
interest in health applications
good knowlegde in relevant Python frameworks for speech synthesis, voice conversion, and/or representation learning
good communicatory skills

Contact

Philipp Aichinger (philipp.aichinger@meduniwien.ac.at) Martin Hagmüller (hagmueller@tugraz.at)