Evaluating State-of-the-Art Voice Conversion Models for Dysphonic and Electro-Larynx Speech
- Published
- Wed, Oct 01, 2025
- Tags
- rotm
- Contact

Voice disorders, such as dysphonia or speech produced using an electro-larynx, often result in reduced intelligibility and unnatural prosody and speech quality. This paper investigates the potential of modern voice conversion (VC) technologies to restore healthy-sounding speech from pathological inputs. Four state-of-the-art VC models (FreeVC, QuickVC, LLVC, and XVC) were fine-tuned on Austrian-German datasets and evaluated using both objective and subjective measures. Results show substantial improvements in perceived naturalness, intelligibility, and vocal health, with listener preference scores exceeding those of the original pathological speech by up to 200 %.
The study employs large-scale listening tests and quantitative analyses to assess intelligibility, rhythm, perceived vocal quality and preference ratings across 93 participants. The best-performing models (QuickVC, FreeVC, and XVC) consistently improved the naturalness and healthiness of the converted voices, while LLVC, although capable of real-time processing on CPUs, showed limited synthesis quality. Spectral analyses confirm that VC restores formant structure and F0 contours characteristic of healthy speech, though prosodic expressiveness remains constrained in severely impaired cases. The results demonstrate that voice conversion can substantially enhance speech rehabilitation outcomes and motivate further research into efficient, high-quality, and real-time VC architectures for clinical applications.
The paper is accapted for the 14th MAVEBA International Workshop (Florence, Italy) and will appear in the MAVEBA Proceedings published by Firenze University Press.
Browse the Results of the Month archive.
