Voice conversion for disease progression modelling
- Status
- Open
- Type
- Master Thesis
- Announcement date
- 30 Oct 2024
- Mentors
- Research Areas
Abstract
Voice conversion refers to the processing of speech audio such that the speaker identity is modified while the linguistic content remains the same. Recent advances in deep neural networks have revolutionized this field of research, enabling synthesis of speech that is almost undistinguishable from authentic speech by listening. What remains a challenge is the conversion of pathological speech. The aim here is to predict post-treatment speech from pre-treatment speech for the purpose of clinical decision support.
Your Tasks
- obtain paired pre- and post-treatment speech audio recordings from an available database
- extract speaker characterizing embeddings from pre- and post-speech
- attempt to predict post- from pre-embeddings
- synthesize post-treatment speech predictions using predicted post-embeddings and a multi-speaker speech synthesizer
- design and conduct a listening experiment investigating speech intelligibility, as well as speaker similarity
- documentation of the work (thesis writing, optional: paper)
Your Profile
- interest in speech science and technology
- interest in health-related applications
- good knowlegde in relevant Python frameworks for speech synthesis, voice conversion, and/or representation learning
- good communication skills
Additonal information
The thesis conducted in cooperation with the MedUni Vienna, so (parts of) the work can also be done from Vienna.
Contact
Philipp Aichinger (philipp.aichinger@meduniwien.ac.at) Martin Hagmüller (hagmueller@tugraz.at)