Signal Processing and Speech Communication Laboratory
home › phd theses

PhD Theses

Anneliese Kelterer: The prosody of interactional and discursive strategies in Austrian conversational speech

Prosody has many functions in speech; e.g., cueing information structure (“Max bought a HOUSE.” vs. “MAX bought a house.”), sentence type (“Max bought a house?”), or communicative functions such as turn management (do I want to continue telling you about Max’s new house or am I done talking). This thesis investigates the prosody of yet another kind of communicative function, the expression of attitude (also called stance-taking, evaluation).

Philipp Hermüller: Automated Anomaly Classification for the Post-Mortem System using Machine Learning

As the size and complexity of future accelerators increases, the automated analysis and validation of machine protection functionalities will become more and more critical. The development of a fully automated analysis tool to classify machine-protection-relevant data in the LHC will serve as proof-of-concept for future high energy colliders. It will allow to identify important design requirements which are relevant for the early design phase of such a collider.

Christian Toth: Bayesian Causal Inference in the Presence of Structural Uncertainty

Few topics in science and philosophy have been as controversial as the nature of causality. Interestingly, the discussion becomes relatively benign, from a philosophical perspective, as soon as one agrees on a well-defined mathematical model of causality, such as Pearl’s structural causal model (SCM). Assuming that the data comes from some model within a considered class of SCMs, causal questions reduce, in principle, to epistemic questions, i.e., questions about what and how much is known about the model.

Sebastian Handel: Modeling Nonlinear Cochlear Mechanisms

This dissertation project explores the cochlea’s intricate and nonlinear mechanisms, a crucial component of the human auditory system. The goal is to develop advanced models that represent these biological processes with greater precision and enhance our understanding of their complexities. The human auditory system exhibits notable nonlinear characteristics in various dimensions, including temporal resolution, frequency resolution, and dynamic amplification. Despite its significance, the underlying nature of this nonlinearity remains poorly understood, which has resulted in models that only capture these features qualitatively. By delving deeper into this area, the research aims to bridge the knowledge gap and contribute to creating more accurate and comprehensive representations of the cochlear function.

Martin Hofmann-Wellenhof: Physics-informed Machine Learning

A multitude of physical phenomena are governed by partial differential equations, and the need to solve these equations quickly and reliably arises in both research and industry. Although state-of-the-art numerical discretisation methods are widely used, significant challenges such as sensitivity to noisy data, high computational cost, and the complexity of mesh generation remain. Machine learning has achieved remarkable success in various domains, but training deep neural networks often requires substantial amounts of data, which are often scarce or expensive to generate for real-world physical systems. Physics-informed machine learning offers a promising alternative by embedding physical laws into the learning process, thereby potentially reducing data requirements. In this thesis, we aim to enhance physics-informed machine learning methods by improving their trainability, enhancing robustness, and incorporating uncertainty quantification.

Max Zimmermann: Psychoacoustic Modelling of Selective Listening in Music

Upon asking what kind of problems hearing aid users have when listening to music, most of the answers will be that some instruments are too loud, some too soft, or that it is all one big mush. The field of musical scene analysis (MSA) investigates the human perceptual ability to organize complex musical structures, such as the sound mixtures of an orchestra, into meaningful lines or streams from its individual instruments or sections. Many studies have already been performed on various MSA-tasks for humans as it bears the key to better understand music perception and help improve the enjoyment of music in hearing impaired people.

Jixiang Lei: Robust Test-Time Adaptation for Visual and Multimodal Learning under Distribution Shifts

In real-world applications of deep learning, distributional shifts between training and test data can significantly degrade model performance—particularly in tasks such as image classification, semantic segmentation, and multimodal perception. This doctoral thesis explores robust Test-Time Adaptation (TTA) strategies designed to dynamically adapt models during inference, without access to source data or labels.

Michael Paierl: Social robots for training medical conversations

Successful medical conversations need practice. In medical education, students practice medical conversations with trained actors, who learn their role with the help of scripts with realistic yet not real patient histories. Given restricted resources, the number of training opportunities available during medical education is limited. This PhD thesis explores how social robots can support the training of students in conducting medical conversations. Specifically, it explores how automatic speech processing technologies can identify and model aspects of effective medical interactions.

Sophie Steger: Uncertainty Estimation in Deep Learning and Industrial Applications

As machine learning models are increasingly deployed in safety-critical and industrial applications, the need for reliable uncertainty estimation alongside predictions becomes essential. Uncertainty estimates not only foster trust in model outputs but also support downstream tasks such as active learning and out-of-distribution detection.

Benedikt Mayrhofer: Voice conversion for Dysphonic and Electrolaryngeal Speech

Voice plays a fundamental role in human communication, not only serving a functional purpose but also shaping personal identity and social interaction. Voice disorders, such as dysphonia or conditions resulting from laryngeal cancer, can severely impact the ability to communicate, often leading to social isolation and psychological burdens. In cases requiring a laryngectomy, patients rely on electro-larynx (EL) devices, which generate unnatural, robotic speech that hinders effective interaction. This research explores the potential of voice conversion (VC) models to enhance speech quality for individuals with pathological voices, bridging the gap between assistive technology and natural communication.

Finished Theses

Inactive Theses