Signal Processing and Speech Communication Laboratory
hometheses & projects › Streaming ASR for Pathological and Alaryngeal Speech

Streaming ASR for Pathological and Alaryngeal Speech

Status
Open
Type
Master Thesis
Announcement date
24 Sep 2025
Mentors
Research Areas

Short description

Electrolaryngeal (EL) speech presents unique acoustic challenges due to its artificial voicing and reduced prosodic variation. While state-of-the-art Automatic Speech Recognition (ASR) systems have demonstrated low Word Error Rates (WER) on standard datasets, their performance on pathological or EL speech in real-time conditions remains underexplored. This thesis investigates causal architectures suited for streaming applications. The study will evaluate state-of-the-art networks in streaming setups and evaluate their performance on pathological speech datasets.

Your Tasks

  • Review current research in streaming ASR
  • Benchmark ASR architectures (e.g., streaming Conformer, RNNT, Whisper with chunked decoding)
  • Analyze latency, real-time factor, and robustness to EL-specific distortions
  • Explore causal adaptation of models
  • Document the methodologies, experimental setups, and results

Your Profile/Prerequisites

  • Strong interest in ASR, speech technology, and pathological applications
  • Experience with Python and ASR toolkits is beneficial (e.g., SpeechBrain, NVIDIA NeMo, OpenAI Whisper)
  • Interests in causal inference and streaming architectures
  • Familiarity with speech signal processing and speech communication

Contact: