Distant automatic speech recognition for use in human-machine interaction as well as automatic event detection and localization used in e-Health surveillance and assistance applications require an automatic localization and separation of the acoustic source. At our lab, research in Acoustic Source Localization and Separation covers localization of a single and of multiple concurrent acoustic sources using single-channel or multi-channel audio inputs. Given these audio inputs, we apply a combination of nonlinear signal processing and of machine learning concepts to achieve single-channel or multi-channel-based source separation using techniques like blind source separation, adaptive beamforming or fundamental frequency based source separation. For recording of real-life audio databases, we have recently setup a special recording room equipped with a flexible setup of microphone arrays that allows us to record different meeting situations, assisted living simulations and other distant speech recognition tasks.
A well-understood special case of nonlinear signal processing is found in adaptive signal processing and control. In its classical setting, a parameterized linear system is used to represent a weakly nonlinear system around an operating point where the optimal parameterization is learnt from the observation of the system input and a desired system output using on-line parameter adaptation algorithms. This setting can be generalized to include parameterized nonlinear systems and to various learning architectures such as cascade system compensation in predistortion or equalization scenarios and parallel system compensation in echo cancellation.
The physical basis for wireless communications is the radio channel whos properties are determined by the effects of multipath propagation. A basic description of the channel is easily found. The received signal is a sum of delayed and attenuated copies of the transmitted signal, due to reflections at any kind of objects in the propagation environment. But the technical implication of these meachanisms is tremendous.
All signal processing systems need to be realized with analog circuits, digital hardware and/or software, or mixed-signal systems which constitute the combination af analog and digital subsystems. Such designs acknowledge that interaction with the physical environment is always analog while digital implementation becomes more and more advantageous in terms of hardware resources and powerconsumption. Going for “Green ICT”, therefore, often means to optimize a mixed-signalsystem design where digital methods assist the analog circuits to strike the best balance in system performance versus sustainable energy use.
In contrast to satellite-based outdoor positioning systems that have been around for several decades, indoor applications have not seen generic, robust solutions yet. The reason lies in fundamental technical and physical challenges. At a first glance, radio frequency (RF) signals seem to be a very promising measurement technology to provide the geometry-related raw data for the positioning system. They can penetrate materials, propagate over large distances, and transceivers can be implemented at low cost, small size, and with low power consumption.
Information Theory is traditionally concerned with data transmission and compression and has not recieved as much intention for the description of signel processing systems. While traditional signal processing measures are related to signal energy and correlation, an information processing view should emphasize the amount of entropy generated by a signal model or the amount of entropy reduction resulting from an input-output system operation. While most linears systems fall in the class od information allpasses which do not increase or decrease the entropy rate of the processed signals, nonlinear systems do. We study the impact of nonlinear systems on the information content
of signals, exploit nonlinear models for signal compression and signal generation, including the recovery of lost information, and study the distributed analysis of sensor data under total capacity constraints.
Research in Language Technologies at our lab is mainly motivated by challenges in automatic speech recognition back-end processing such as language modelling, modelling variation in pronunciation, or text alignment. Our text analysis methods comprise customizable phonetic and semantic similarity measures which have been evaluated on large industrial and scientific text collections with respect to large vocabulary continuous speech recognition (dictation).
Nonlinear Modeling aims at a more accurate representation of physical reality where many systems are found to violate the basic prerequisite of linear models: the so-called superposition principle which states that the effect of the sum of multiple system inputs equals the sum of the effects of the individual inputs. This principle is often violated due to limitations in the maximum amplitude a physical quantity may reach, and due to basic physical laws which show a nonlinear relationship among the relevant variables of the problem domain. These nonlinear effects often become more prominent with the ongoing miniaturization of the electronic devices used for systems realization.
Probabilistic graphical models unite probability and graph theory and allow to efficiently formalize both static and dynamic, as well as linear and nonlinear systems and processes. Many well-known statistical models, e.g. mixture models, factor analysis, hidden Markov models, Kalman filters, Bayesian networks, Boltzmann machines, the Ising model, just to name a few, can be represented in the framework of graphical models. This framework provides techniques for inference (sum/max-product algorithm) and learning. The flexibility in representing the structure of the considered phenomenon makes graphical models applicable in many research areas.
There are two basic approaches for learning graphical models in the scientific community: generative and discriminative learning. Unfortunately, generative learning does not always provide good results. Discriminative learning is known to be more accurate for classification. In contrast to discriminative models (e.g. neural networks, support vector machines), the benefit of discriminatively learned generative graphical models (e.g. Bayesian networks) still maintains, especially, to work with missing variables by marginalizing the unknown ones. In particular, we have developed methods for generative and discriminative (e.g. max-margin) structure and parameter learning for Bayesian network classifiers.
Furthermore, graphical models have been applied to various speech and image processing applications.
The automatic and simultaneous identification, localization, and tracking of targets using electromagnetic radiation started mainly as a military application in radar systems. In the early 1970s, commercial tracking of large and expensive goods emerged, followed by smaller items by the end of the 20th century. Since then, RF identification (RFID) became almost ubiquitous in commercial applications, e.g., tracking and identification of goods or electronic article surveillance.
Our group is concerned with the development of algorithms for a range of important speech analysis tasks both for single channel and multichannel speech. These tasks include voice activity detection, speech enhancement, pitch and multipitch tracking, phone segmentation and classification as well as speaker segmentation and identification.
At our lab, automatic speech recognition research is focused around front-end processing for robust speech recognition (noise reduction, voice activity detection, blind source separation, and speech quality assessment). Our algorithms have been tested in various ASR contexts, from command&control applications on embedded devices in industrial environments (SNOW) and air traffic control (ATCOSIM), to large vocabulary dictation systems (COAST-ROBUST). Furthermore, we provide state-of-the-art academic recognition setups for German and English research
databases such as SpeechDat-II, TIMIT, or the Wall Street Journal Database (WSJ0), GRASS.
Speech is only useful when it is transmitted from a speaker to a listener. Very often this is done in a telecommunication setting. If we want to transmit speech often there are restrictions that include limited audio and/or network bandwidth. Therefore, one issue is to use the available resources - while maximizing the speech quality - by finding efficient speech coding and error concealment strategies. In a real-world-environment the speech signal cannot be picked up with perfect quality, e.g. when the speaker is driving a car. This results in efforts to enhance the speech quality for the listener. Tasks include noise suppression, where based on the statistics of the background noise, we try to remove those unwanted signal components from the noisy speech signal. Further, in a hands-free communication scenario, echo cancellation is used to subtract the signal from the loudspeaker that is picked up again by the microphone, which would otherwise be heard by the far-end speaker as echo of his voice. For low audio-bandwidth signals, artificial bandwidth extension can improve the audio signal quality considerably. Any enhancement effort can also deteriorate the desired speech sound, so minimizing this effect is an important task. Finally, the enhancement of speech as produced by humans suffering from voice pathologies is an important application area.
The wealth of advantages derived from a large signaling bandwidth has motivated the considerable interest shown in the past years towards Ultra Wideband (UWB) communication systems. The possibility of extremely high data rates as well as high-accuracy ranging, together with the promise of low-power and low-complexity devices are some of the many features making UWB so attractive. However, UWB system design poses a number of new technical challenges, and traditional design guidelines are insufficient, or even misleading.
Computational methods for the analysis of lung sounds are beneﬁcial for computer-supported diagnosis, digital storage and monitoring in critical care. Pathological changes of the lung are tightly connected to characteristic sounds enabling a fast and inexpensive diagnosis. Traditional auscultation with a stethoscope has several disadvantages: subjectiveness, i.e. the lung sounds are evaluated depending on the experience of the physician, cannot provide continuous monitoring and a trained expert is required. Furthermore, the characteristics of the sounds are in the low frequency range, where the human hearing has limited sensitivity and is susceptible to noise artifacts.