Streaming ASR for Pathological and Alaryngeal Speech

home › theses & projects › Streaming ASR for Pathological and Alaryngeal Speech

Streaming ASR for Pathological and Alaryngeal Speech

Status

Open

Type

Master Thesis

Announcement date

24 Sep 2025

Mentors

Research Areas

Short description

Electrolaryngeal (EL) speech presents unique acoustic challenges due to its artificial voicing and reduced prosodic variation. While state-of-the-art Automatic Speech Recognition (ASR) systems have demonstrated low Word Error Rates (WER) on standard datasets, their performance on pathological or EL speech in real-time conditions remains underexplored. This thesis investigates causal architectures suited for streaming applications. The study will evaluate state-of-the-art networks in streaming setups and evaluate their performance on pathological speech datasets.

Your Tasks

Review current research in streaming ASR
Benchmark ASR architectures (e.g., streaming Conformer, RNNT, Whisper with chunked decoding)
Analyze latency, real-time factor, and robustness to EL-specific distortions
Explore causal adaptation of models
Document the methodologies, experimental setups, and results

Your Profile/Prerequisites

Strong interest in ASR, speech technology, and pathological applications
Experience with Python and ASR toolkits is beneficial (e.g., SpeechBrain, NVIDIA NeMo, OpenAI Whisper)
Interests in causal inference and streaming architectures
Familiarity with speech signal processing and speech communication

Contact:

Martin Hagmüller (hagmueller@tugraz.at or 0316/873 4377)
Benedikt Mayrhofer (benedikt.mayrhofer@tugraz.at)