Signal Processing and Speech Communication Laboratory
hometheses & projects › Voice Conversion on Embedded Devices

Voice Conversion on Embedded Devices

Status
Open
Type
Master Thesis
Announcement date
24 Sep 2025
Mentors
Research Areas

Short description

Modern Voice Conversion (VC) systems have made major advances using deep neural networks, but these systems are often too resource-intensive for real-time applications on low-power platforms such as smartphones or Raspberry Pi devices. This thesis should investigate the usability and limitations of VC models on such hardware, focusing on model latency, computational efficiency, and conversion quality. Special attention will be given to evaluating lightweight, open-source VC models (e.g., KNN-VC, TinyVC, FasterSVC, LLVC, StreamVC) and VC models developed by our group. The goal is to understand trade-offs in real-time voice conversion and evaluate which architectures are practical for deployment.

Your Tasks

  • Use lightweight VC architectures (e.g., KNN-VC, TinyVC, FasterSVC, LLVC, StreamVC) on low-power devices (e.g., Raspberry Pi, smartphones)
  • Evaluate models in terms of:
    • Real-time factor (RTF)
    • Latency and inference time
    • CPU and memory consumption
    • Voice conversion quality
  • Investigate performance optimization strategies
  • Document the methodology, experimental setup, and results

Your Profile/Prerequisites

  • Programming skills in Python; basic experience with deep learning frameworks like PyTorch
  • Familiarity with speech signal processing and speech communication
  • Interest in embedded AI and real-time systems

Contact: