Signal Processing and Speech Communication Laboratory
home › research topics

Research Topics

Acoustic Source Localization and Separation

Distant automatic speech recognition for use in human-machine interaction as well as automatic event detection and localization used in e-Health surveillance and assistance applications require an automatic localization and separation of the acoustic source. At our lab, research in acoustic source localization and separation covers localization of a single and of multiple concurrent acoustic sources using single-channel or multi-channel audio inputs. Given these audio inputs, we apply a combination of nonlinear signal processing and of machine learning concepts to achieve single-channel or multi-channel-based source separation using techniques like blind source separation, adaptive beamforming or fundamental frequency based source separation. Since several years, we use deep neural networks for speech separation, dereverberation and speech enhancement. Most of our techniques exploit multiple microphone signals to obtain performances beyond the state-of-the-art. Furthermore, for recording of real-life audio databases, we have recently setup a special recording room equipped with a flexible setup of microphone arrays that allows us to record different meeting situations, assisted living simulations and other distant speech recognition tasks.

Adaptive Signal Processing & Control

A well-understood special case of nonlinear signal processing is found in adaptive signal processing and control. In its classical setting, a parameterized linear system is used to represent a weakly nonlinear system around an operating point where the optimal parameterization is learnt from the observation of the system input and a desired system output using on-line parameter adaptation algorithms. This setting can be generalized to include parameterized nonlinear systems and to various learning architectures such as cascade system compensation in predistortion or equalization scenarios and parallel system compensation in echo cancellation.

Channel Modeling

The physical basis for wireless communications is the radio channel whos properties are determined by the effects of multipath propagation. A basic description of the channel is easily found. The received signal is a sum of delayed and attenuated copies of the transmitted signal, due to reflections at any kind of objects in the propagation environment. But the technical implication of these meachanisms is tremendous.

Circuits, Systems, Algorithms

All signal processing systems need to be realized with analog circuits, digital hardware and/or software, or mixed-signal systems which constitute the combination af analog and digital subsystems. Such designs acknowledge that interaction with the physical environment is always analog while digital implementation becomes more and more advantageous in terms of hardware resources and powerconsumption. Going for “Green ICT”, therefore, often means to optimize a mixed-signalsystem design where digital methods assist the analog circuits to strike the best balance in system performance versus sustainable energy use.

(Indoor) Localization

In contrast to satellite-based outdoor positioning systems that have been around for several decades, indoor applications have not seen generic, robust solutions yet. The reason lies in fundamental technical and physical challenges. At a first glance, radio frequency (RF) signals seem to be a very promising measurement technology to provide the geometry-related raw data for the positioning system. They can penetrate materials, propagate over large distances, and transceivers can be implemented at low cost, small size, and with low power consumption.

Information Processing & Coding

Information Theory is traditionally concerned with data transmission and compression and has not recieved as much intention for the description of signel processing systems. While traditional signal processing measures are related to signal energy and correlation, an information processing view should emphasize the amount of entropy generated by a signal model or the amount of entropy reduction resulting from an input-output system operation. While most linears systems fall in the class od information allpasses which do not increase or decrease the entropy rate of the processed signals, nonlinear systems do. We study the impact of nonlinear systems on the information content
of signals, exploit nonlinear models for signal compression and signal generation, including the recovery of lost information, and study the distributed analysis of sensor data under total capacity constraints.

Language Technologies

Research in Language Technologies at our lab has two foci. The first one is mainly motivated by challenges in automatic speech recognition back-end processing such as language modelling, modelling variation in pronunciation, or text alignment. Our text analysis methods comprise customizable phonetic and semantic similarity measures which have been evaluated on large industrial and scientific text collections with respect to large vocabulary continuous speech recognition (dictation). We investigated models for dialect transformations at both, the sentence-level using grammatical transformations and the word-level with pronunciation transformations in close cooperation with our speech synthesis efforts. Since only recently, we have been investigating language modelling methods for conversational speech, a speaking style wich comes with the additional challenge of large intra- and interspeaker variation and only small amounts of data available.

Machine Learning

Music perception and hearing devices

Until recently, hearing loss has been a blind spot on the map of music perception research. We aim to provide an empirical groundwork that allows for an optimization of hearing aids to music. This involves research on a host of questions: How is music listening affected by hearing loss? Hearing devices are currently optimized for speech-how can we improve music listening with hearing aids and cochlear implants? Can we develop audio-tactile interfaces to enhance music listening?

Musical acoustics

We develop audio-based models of musical sounds that shed light on how acoustic information can be exploited by human perception. This includes transient extraction algorithms, quantifications of frequency micro-modulations in singing voices, and spectral spaces of instrument sounds. The goal is to reveal a parsimonious feature set that allows us to characterize the acoustical signature of musical instrument sounds.

Nonlinear Modeling

Nonlinear Modeling aims at a more accurate representation of physical reality where many systems are found to violate the basic prerequisite of linear models: the so-called superposition principle which states that the effect of the sum of multiple system inputs equals the sum of the effects of the individual inputs. This principle is often violated due to limitations in the maximum amplitude a physical quantity may reach, and due to basic physical laws which show a nonlinear relationship among the relevant variables of the problem domain. These nonlinear effects often become more prominent with the ongoing miniaturization of the electronic devices used for systems realization.

Probabilistic Graphical Models

Psychoacoustics

How do listeners parse and organize complex musical scenes with sounds from multiple instruments overlapping in time and frequency? How can we define timbre and pitch and what do these parameters do in music and speech? How can we model these phenomena on a signal-level? By gaining a better understanding of these questions, we seek to improve our general understanding of how listeners make sense of sound in a noisy world.

RFID Systems

The automatic and simultaneous identification, localization, and tracking of targets using electromagnetic radiation started mainly as a military application in radar systems. In the early 1970s, commercial tracking of large and expensive goods emerged, followed by smaller items by the end of the 20th century. Since then, RF identification (RFID) became almost ubiquitous in commercial applications, e.g., tracking and identification of goods or electronic article surveillance.

Speech Analysis

Our group is concerned with the development of algorithms for a range of important speech analysis tasks both for single channel and multichannel speech. These tasks include voice activity detection, speech enhancement, pitch and multipitch tracking, phone segmentation and classification as well as speaker segmentation and identification.

Speech and Speaker Recognition

At our lab, automatic speech recognition research is focused around front-end processing for robust speech recognition (noise reduction, voice activity detection, blind source separation, and speech quality assessment). Our algorithms have been tested in various ASR contexts, from command&control applications on embedded devices in industrial environments (SNOW) and air traffic control (ATCOSIM), to large vocabulary dictation systems (COAST-ROBUST). Furthermore, we provide state-of-the-art academic recognition setups for German and English research
databases such as SpeechDat-II, TIMIT, or the Wall Street Journal Database (WSJ0), GRASS.

Speech Enhancement and Transmission

Speech is only useful when it is transmitted from a speaker to a listener. Very often this is done in a telecommunication setting. If we want to transmit speech often there are restrictions that include limited audio and/or network bandwidth. Therefore, one issue is to use the available resources - while maximizing the speech quality - by finding efficient speech coding and error concealment strategies. In a real-world-environment the speech signal cannot be picked up with perfect quality, e.g. when the speaker is driving a car. This results in efforts to enhance the speech quality for the listener. Tasks include noise suppression, where based on the statistics of the background noise, we try to remove those unwanted signal components from the noisy speech signal. Further, in a hands-free communication scenario, echo cancellation is used to subtract the signal from the loudspeaker that is picked up again by the microphone, which would otherwise be heard by the far-end speaker as echo of his voice. For low audio-bandwidth signals, artificial bandwidth extension can improve the audio signal quality considerably. Any enhancement effort can also deteriorate the desired speech sound, so minimizing this effect is an important task. Finally, the enhancement of speech as produced by humans suffering from voice pathologies is an important application area.

Ultra-Wideband Systems

The wealth of advantages derived from a large signaling bandwidth has motivated the considerable interest shown in the past years towards Ultra Wideband (UWB) communication systems. The possibility of extremely high data rates as well as high-accuracy ranging, together with the promise of low-power and low-complexity devices are some of the many features making UWB so attractive. However, UWB system design poses a number of new technical challenges, and traditional design guidelines are insufficient, or even misleading.

Computational Lung Sound Analysis

Computational methods for the analysis of lung sounds are beneficial for computer-supported diagnosis, digital storage and monitoring in critical care. Pathological changes of the lung are tightly connected to characteristic sounds enabling a fast and inexpensive diagnosis. Traditional auscultation with a stethoscope has several disadvantages: subjectiveness, i.e. the lung sounds are evaluated depending on the experience of the physician, cannot provide continuous monitoring and a trained expert is required. Furthermore, the characteristics of the sounds are in the low frequency range, where the human hearing has limited sensitivity and is susceptible to noise artifacts.