Voice Conversion on Embedded Devices

home › theses & projects › Voice Conversion on Embedded Devices

Voice Conversion on Embedded Devices

Status

Open

Type

Master Thesis

Announcement date

24 Sep 2025

Mentors

Research Areas

Short description

Modern Voice Conversion (VC) systems have made major advances using deep neural networks, but these systems are often too resource-intensive for real-time applications on low-power platforms such as smartphones or Raspberry Pi devices. This thesis should investigate the usability and limitations of VC models on such hardware, focusing on model latency, computational efficiency, and conversion quality. Special attention will be given to evaluating lightweight, open-source VC models (e.g., KNN-VC, TinyVC, FasterSVC, LLVC, StreamVC) and VC models developed by our group. The goal is to understand trade-offs in real-time voice conversion and evaluate which architectures are practical for deployment.

Your Tasks

Use lightweight VC architectures (e.g., KNN-VC, TinyVC, FasterSVC, LLVC, StreamVC) on low-power devices (e.g., Raspberry Pi, smartphones)
Evaluate models in terms of:
- Real-time factor (RTF)
- Latency and inference time
- CPU and memory consumption
- Voice conversion quality
Investigate performance optimization strategies
Document the methodology, experimental setup, and results

Your Profile/Prerequisites

Programming skills in Python; basic experience with deep learning frameworks like PyTorch
Familiarity with speech signal processing and speech communication
Interest in embedded AI and real-time systems

Contact:

Martin Hagmüller (hagmueller@tugraz.at or 0316/873 4377)
Benedikt Mayrhofer (benedikt.mayrhofer@tugraz.at)