Voice conversion for disease progression modelling

home › theses & projects › Voice conversion for disease progression modelling

Voice conversion for disease progression modelling

Status

Open

Type

Master Thesis

Announcement date

30 Oct 2024

Mentors

Martin Hagmüller

Research Areas

Speech Communication

Abstract

Voice conversion refers to the processing of speech audio such that the speaker identity is modified while the linguistic content remains the same. Recent advances in deep neural networks have revolutionized this field of research, enabling synthesis of speech that is almost undistinguishable from authentic speech by listening. What remains a challenge is the conversion of pathological speech. The aim here is to predict post-treatment speech from pre-treatment speech for the purpose of clinical decision support.

Your Tasks

obtain paired pre- and post-treatment speech audio recordings from an available database
extract speaker characterizing embeddings from pre- and post-speech
attempt to predict post- from pre-embeddings
synthesize post-treatment speech predictions using predicted post-embeddings and a multi-speaker speech synthesizer
design and conduct a listening experiment investigating speech intelligibility, as well as speaker similarity
documentation of the work (thesis writing, optional: paper)

Your Profile

interest in speech science and technology
interest in health-related applications
good knowlegde in relevant Python frameworks for speech synthesis, voice conversion, and/or representation learning
good communication skills

Additonal information

The thesis conducted in cooperation with the MedUni Vienna, so (parts of) the work can also be done from Vienna.

Contact

Philipp Aichinger (philipp.aichinger@meduniwien.ac.at) Martin Hagmüller (hagmueller@tugraz.at)