Streaming ASR for Pathological and Alaryngeal Speech
- Status
- Open
- Type
- Master Thesis
- Announcement date
- 24 Sep 2025
- Mentors
- Research Areas
Short description
Electrolaryngeal (EL) speech presents unique acoustic challenges due to its artificial voicing and reduced prosodic variation. While state-of-the-art Automatic Speech Recognition (ASR) systems have demonstrated low Word Error Rates (WER) on standard datasets, their performance on pathological or EL speech in real-time conditions remains underexplored. This thesis investigates causal architectures suited for streaming applications. The study will evaluate state-of-the-art networks in streaming setups and evaluate their performance on pathological speech datasets.
Your Tasks
- Review current research in streaming ASR
- Benchmark ASR architectures (e.g., streaming Conformer, RNNT, Whisper with chunked decoding)
- Analyze latency, real-time factor, and robustness to EL-specific distortions
- Explore causal adaptation of models
- Document the methodologies, experimental setups, and results
Your Profile/Prerequisites
- Strong interest in ASR, speech technology, and pathological applications
- Experience with Python and ASR toolkits is beneficial (e.g., SpeechBrain, NVIDIA NeMo, OpenAI Whisper)
- Interests in causal inference and streaming architectures
- Familiarity with speech signal processing and speech communication
Contact:
- Martin Hagmüller (hagmueller@tugraz.at or 0316/873 4377)
- Benedikt Mayrhofer (benedikt.mayrhofer@tugraz.at)