Signal Processing and Speech Communication Laboratory
hometheses & projects › Voice conversion for disease progression modelling

Voice conversion for disease progression modelling

Status
Open
Type
Master Thesis
Announcement date
30 Oct 2024
Mentors
Research Areas

Abstract

Voice conversion refers to the processing of speech audio such that the speaker identity is modified while the linguistic content remains the same. Recent advances in deep neural networks have revolutionized this field of research, enabling synthesis of speech that is almost undistinguishable from authentic speech by listening. What remains a challenge is the conversion of pathological speech. The aim here is to predict post-treatment speech from pre-treatment speech for the purpose of clinical decision support.

Your Tasks

  • obtain paired pre- and post-treatment speech audio recordings from an available database
  • extract speaker characterizing embeddings from pre- and post-speech
  • attempt to predict post- from pre-embeddings
  • synthesize post-treatment speech predictions using predicted post-embeddings and a multi-speaker speech synthesizer
  • design and conduct a listening experiment investigating speech intelligibility, as well as speaker similarity
  • documentation of the work (thesis writing, optional: paper)

Your Profile

  • interest in speech science and technology
  • interest in health-related applications
  • good knowlegde in relevant Python frameworks for speech synthesis, voice conversion, and/or representation learning
  • good communication skills

Additonal information

The thesis conducted in cooperation with the MedUni Vienna, so (parts of) the work can also be done from Vienna.

Contact

Philipp Aichinger (philipp.aichinger@meduniwien.ac.at) Martin Hagmüller (hagmueller@tugraz.at)