Voice Conversion on Embedded Devices
- Status
- Open
- Type
- Master Thesis
- Announcement date
- 24 Sep 2025
- Mentors
- Research Areas
Short description
Modern Voice Conversion (VC) systems have made major advances using deep neural networks, but these systems are often too resource-intensive for real-time applications on low-power platforms such as smartphones or Raspberry Pi devices. This thesis should investigate the usability and limitations of VC models on such hardware, focusing on model latency, computational efficiency, and conversion quality. Special attention will be given to evaluating lightweight, open-source VC models (e.g., KNN-VC, TinyVC, FasterSVC, LLVC, StreamVC) and VC models developed by our group. The goal is to understand trade-offs in real-time voice conversion and evaluate which architectures are practical for deployment.
Your Tasks
- Use lightweight VC architectures (e.g., KNN-VC, TinyVC, FasterSVC, LLVC, StreamVC) on low-power devices (e.g., Raspberry Pi, smartphones)
- Evaluate models in terms of:
- Real-time factor (RTF)
- Latency and inference time
- CPU and memory consumption
- Voice conversion quality
- Investigate performance optimization strategies
- Document the methodology, experimental setup, and results
Your Profile/Prerequisites
- Programming skills in Python; basic experience with deep learning frameworks like PyTorch
- Familiarity with speech signal processing and speech communication
- Interest in embedded AI and real-time systems
Contact:
- Martin Hagmüller (hagmueller@tugraz.at or 0316/873 4377)
- Benedikt Mayrhofer (benedikt.mayrhofer@tugraz.at)