Wider research context and theoretical framework
With currently available Automatic Speech Recognition (ASR) systems, very good recognition performance can be obtained for read speech (word accuracies of 100 – 90%), but not for conversational speech (60 – 80 %). Highly accurate ASR systems for conversational speech are especially relevant for conversational dialogue systems, as they shall become more conversational, interactional and social rather than transactional. Thus, in recent decades, an increasing number of studies have focused on investigating the differences between these speaking styles in order to find ways how to improve ASR performance for conversational speech. One difference between read and conversational speech is that the degree of pronunciation variation in conversational speech is much higher than in read speech. In spontaneous speech, a word like “yesterday” may sound like yeshay and the German word “haben” (“to have”) may sound like ham. The pronunciation of the words depends on well-known factors such as the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. In order to untangle these potentially correlating effects of linguistic, extra-linguistic and prosodic structure, elaborate modeling techniques are needed.
Deep representation learning is one of the main factors for the recent performance boost in many image, signal and speech processing problems. This is particularly true when having big amounts of data and almost unlimited computing resources available as demonstrated in competitions such as for example ImageNet. However, in real-world scenarios the computing infrastructure is often restricted and the computational requirements are not fulfilled. In this research proposal we suggest several directions for reducing the computational burden, i.e. the number of arithmetic operations, while maintaining the level of recognition performance.
Everyday life applications highly depend on successful speech transmission and speech communication, to name a few: smart homes with voice commands, hands-free mobile telephony, and speech recognition with machines. In all these applications it is quite important to guarantee a high performance robust to the background noise or reverberation in the room. A pre-processing stage in the form of signal enhancement is very important in order to remove the undesired background noise sources. Our goal is to develop methods for estimating the desired source signal observed in noise and to tackle new challenges in different speech applications: noise reduction, source separation, robust automatic speech/speaker recognition and artificial bandwidth extension.
In this project, which is funded by Higher Education Structural Funds (Hochschulraumstrukturmittel), seven university partners and three other institutions deal with theoretical and practical aspects of the Digital Edition from different perspectives.
Wireless communication and localization are key components of the envisioned “Internet-of-Things”. However, wireless technologies suffer from physical and man-made impairments, e.g., multipath propagation and interferences from competing transmissions, as well as from the effect of temperature variations and other environmental properties. This impairs the accuracy, latency, loss, and energy consumption of wireless services. Our key objective is to offer statistical guarantees on the reliability and availability of correct wireless localization and communication by automatically adapting system parameters using models of the transceiver hardware and the environment.
People, who lost their larynx, e.g. due to cancer, depend on a substitution voice. The three most common methods (esophageal voice, voice prosthesis, electronic speech aid) sound male, if they sound human at all. Since for a long time, the huge majority of the patients were men, this issue never came into the focus of research and development. However, in recent years, there has been a significant increase of female patients, so that a variety of voice qualities that go beyond the existing male norm, are of increasing importance for the products of the company partner.
The aim of MIMIC is to gain further understanding of the mechanisms of psychological and physiological adaptation or maladaptation in extreme or stressful environments through computerized analysis of speech and the content of spoken and written verbal communication. The project also aims at the improvement of the data collection and analysis methods developed in previous studies and prove their applicability in an operational environment.
Discriminative learning of Bayesian networks (BNs) for classification tasks is often beneficial compared to generative learning. This is particularly true in case of model mismatch, i.e. when the BN cannot represent the true data distribution. In the past, we developed maximum margin parameter learning for Bayesian network classifiers and Gaussian Mixture models. Furthermore, we used the margin objective for approximate and exact structure learning. This research is extended within this proposal. The focus is three-fold: (i) Extension of margin-based parameter learning to a hybrid paradigm merging the advantages of generative and discriminative learning. We aim at extending our learning framework to semi-supervised, missing features, and latent variable scenarios. This requires efficient inference during iterative parameter optimization. Additionally, both the discriminative and hybrid learning approach are introduced to potentially deep sum-product networks (SPNs). They explicitly represent the inference process, i.e. structures (including latent variables) exhibiting computational benefits for inference can be exploited. (ii) Discriminative search-and-score structure learning in BNs is time-consuming. We are interested in approximating the non-decomposable discriminative score by a decomposable surrogate to ease the computational costs for score evaluation in BNs. Furthermore, we aim at developing structure learning algorithms for SPNs introducing a global scoring function with an inference cost penalty. (iii) To consolidate SPNs with respect to empirical performance we will compare all developed models to popular generative and discriminative models from the deep community, i.e. restricted Boltzmann machine, auto-encoders, deep belief networks, multi-layer perceptron. Additionally, one particularly interesting recent deep model generative stochastic networks is considered.
More than 50% of adults in Germany have difficulties to fully comprehend information ditributed by government authorities and companies. This lack of reading abilites excludes people from knowing their right, from education and could even put them in danger. Communication in plain language is therefore an importamt tool to reduce barriers for information comprehension.
The project investigates a localization system of passive RFID tags for an intelligent process control system. The real-time tracking of components, tools, and products is a key technology to optimize work flows, e.g. in flexible manufacturing. REFlex not only covers research of the localization system and modeling of flexible production environments: Ethical and social implications of the new technology (possible tracking of persons) are studied also.
Siemens announced in cooperation with the SPSC Lab a Master Thesis in the area of signal analysis using pattern recognition and machine learning techniqes.
The project ENTRANCE has the goal to investigate signal processing and system design methods that enable the design of flexible and power-efficient radio transmitters.
The Problem Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared or read speech, conversational speech contains utterances that might be considered ‘ungrammatical’ and contain disfluencies, such as “…oh, well, I think ahhm exactly …” The pronunciation of the words may depend for instance on the regional background of the speakers, the formality of the situation or the frequency of the word. A highly frequent word like “yesterday” may sound like yeshay and the German word “haben” (“to have”) may sound like ham. This project focused on investigating interdisciplinary methods (including linguistics, phonetics, speech technology) to model the factors on which pronunciation variation depends in everyday speech.
Robustness against reverberation, noise, and interfering audio signals is one of the grand challenges in speech recognition, speech understanding, and audio analysis technology. One avenue to approach this challenge is single-channel audio separation. Recently, factorial hidden Markov models have won the single-channel speech separation and recognition challenge. These models are capable of modeling acoustic scenes with multiple sources interacting over time. While these models reach super-human performance on specific tasks, there are still serious limitations restricting the applicability in many areas.
The project MINT investigates an RF-based localization and tracking system intended for indoor use. The method to be investigated, previously proposed by our group, exploits information from reflected multipath components, assuming prior knowledge of a floor plan. This approach has been termed “multipath-assisted indoor navigation and tracking (MINT)”. The project evaluates the practical feasibility of the MINT approach.
Ziel des vorliegenden transdisziplinären grundlagenorientierten Forschungsprojektes ist die computerunterstützte Analyse von akustischen Signalen zur nicht-invasiven Diagnostik thorakaler Erkrankungen. Dabei werden akustische Signale über Sensoren, die am Thorax des Patienten positioniert sind, aufgenommen und mittels intelligenter Analyseverfahren klassifiziert. Die physiologischen Atemgeräusche werden über eine krankheitsbedingte Veränderung der Schallbedingungen in der Brusthöhle unterschiedlich alteriert und sind deshalb als akustisch charakteristische Signale wahrnehmbar. Das Projektziel ist eine zuverlässige computerunterstütze Analyse und Klassifikation dieser Signale.
The project aims enrich the available psychological knowledge through phonological and content analysis a variety of recorded speech samples collected at regular periods from the over-wintering crews at Concordia Antarctic Research Station.
Seit 2009 werden zum interdisziplinären Thema Klassenraumakustik verschiedene Arbeiten durchgeführt mit dem Ziel, den Einfluss der Raumakustik in Klassenzimmern auf den Schulalltag zu untersuchen und die enorm vielfältigen Zusammenhänge aufzuzeigen.
Some people, after suffering voice problems over a longer period of time, are confronted with the diagnosis of laryngeal cancer. While at an early stage there is a good chance of healing and being able to continue the previous live, sometimes the last chance is to remove the entire larynx. Vocal communication as usual is not possible anymore. The person has to learn to use a substitution voice, which sounds very different compared to a natural voice. The social stigma, which can go along with the medical situation, poses the danger to lead the person into social isolation. The estimated number of total laryngectomees is about 600.000 people worldwide and 21000 laryngectomees in Germany with about 3000 additional laryngectomy operations performed every year.
Specially for alaryngeal speech, one direction of research and practice is to find ways to replace the larynx with something that can take over the task that the larynx fulfilled, namely the production of voice and the switch between trachea and esophagus. While the possibility to reach this goal is still far away, the goal of this projects is to develop a new voice producing device that enable both female and male people to speak with a natural and intelligible voice.
The modeling, measurement, transmission, and processing of information-bearing data and signals are key constituents of any modern technical system. Driven by scalability and reliability considerations, there has recently been a remarkable trend to implement these constituents in a distributed manner. Notable examples for distributed information processing architectures are communication networks, sensor networks, smart grids, traffic telematic systems, and grid computing. The project Signal and Information Processing in Science and Engineering (SISE) aims at making fundamental contributions to some of the most eminent and pressing problems arising in the context of distributed information processing. This ambitious goal requires the development of new mathematical theories, the design and analysis of algorithms and communication protocols, and implementations in hardware and software. The SISE network consists of research groups working in mathematics, signal processing, communications, machine learning, and scientific computing, and hence is perfectly suited to meet the challenges imposed by the multi-disciplinary nature of the project aim.
The DIRHA project addresses the development of voice-enabled automated home environments based on distant-speech interaction in different languages. A distributed microphone network is installed in the rooms of a house in order to monitor selectively acoustic and speech activities observable inside any space, and to eventually run a spoken dialogue session with a given user in order to implement a service or to have access to appliances and other devices. The multi-microphone front-end is based on the use of arrays consisting of analog microphones or Micro Electro-Mechanical Systems (MEMS) digital microphones. The targeted system analyses the given multi-space acoustic scene in a coherent way, by processing in a parallelized fashion simultaneous activities which occur in different rooms, and in case by supporting at the same time the interaction with users who may speak in different areas of the house.
The aim of this project is the demonstration, validation, and evaluation of a wireless multicarrier transmission scheme that employs a novel noncoherent receiver. The receiver supports energy detection of a multiband ultra-wideband (UWB) signal. It is a robust, power-efficient receiver architecture that is capable of collecting energy from the multipath components of the channel response and it has a scalable increased data rate. The design and evaluation of a hardware demonstrator for this receiver architecture is the key objective of the project. Central element of the demonstrator is an analog frontend that lowers the requirement for digital signal processing and for power hungry analog-to-digital conversion. The proposed receiver may be used in wireless systems that perform data transmissions at extremely large data rates (>= 500 MBit/s) over limited distances (<= 5 m). It could be used to connect mass storage as available in todays mobile handsets to personnel computers, home entertainment systems, or some public/commercial information kiosk, for instance. The discussed receiver has a radically different system architecture. Therefore, it may be capable of providing these ultra-high data rates at a significantly reduced energy per transmitted bit, compared with conventional systems. Key aim of this project is demonstrating the feasibility of such a receiver and evaluating the possible power saving.
Within the project LOBSTER a system for analysing escaping groups of people in crisis situations in public buildings/constructions is developed. For the localisation and the analysis of the activities of the escaping groups of people, the positioning technologies GNSS, WLAN, and MEMS of common smart phones are used. The determined positions are transmitted to a LBS centre in case of distress. In the centre these data are used in combination with plant layouts and mathematical filter technologies (mathematical particle filter and Kalman Filter) to analyse and predict the escape behaviour. The analysis supports the first responders in establishing a significantly improved coordination and resource scheduling of the rescue teams. The rescue teams themselves are equipped with a localisation system and also send their positions to the LBS center. In combination with the position data of the fugitives it is now possible to detect the escape ways and thus to coordinate the rescue teams in a best possible manner by specific instructions. Furthermore, it will be analysed during the project to what extent an improvement of the indoor positioning accuracy can be achieved by the use of UWB (Ultra-Wideband) techniques and thus, the localisation of security-related equipment and assets can be provided. A very innovative approach is used which depends on the analysis of signal reflections and plant layouts. Further, a psychological analysis of human factors in terms of escaping crowds of people is carried out to identify patterns of movement and escaping reactions in crisis situations.
The main idea of the DRAGON project is to research and use new design methodologies and architectural innovations, based on reconfigurability and state-of-the-art digital CMOS technology, in order to break the barriers imposed by the lack of scaling properties of analog components. With this concept, distinct reductions in cost, size and energy consumption for multi-standard cellular handsets can be achieved, while higher demands on data rate can be met.Data rates are increasing every day, therefore, the energy consumption per transmitted or received data bit has to be reduced in order to save energy and avoid thermal problems. Wireless data services will become an attractive low-cost alternative to be used in novel applications. In the DRAGON project a design platform comprising multi-standard transceiver specifications and novel flexible architectures is developed. The number of required external components, like analog filters, are replaced by reconfigurable digital CMOS (Complementary Metal Oxide Semiconductor) circuitry; and critical building-blocks are implemented to demonstrate proof of concept, both of the architecture and design methodology. All critical building-blocks are fabricated, tested, and demonstrated in state-of-the-art CMOS technology. The project results are also being provided to standardisation bodies, allowing an alignment of requirements to technology limits.
Graphical models have become the method of choice for representation of uncertainty in machine learning. Two research issues are currently of major interest in the scientific community: First, much work is devoted to find and analyze more efficient approximate inference algorithms, e.g, loopy belief propagation, variational methods, sampling methods, concave-convex procedure, loop corrections, et cetera. Second, there has been much interest in learning the parameters and the structure of directed graphical models from data. Basically, there are two main paradigms for learning in the machine learning community: generative and discriminative learning. Generative learning is well explored for directed graphical models, whereas, discriminative learning still needs more elaboration. The aim of the proposed research is on discriminative learning of graphical models. In particular, we want to devote significant work on developing discriminative structure and parameter learning algorithms for Bayesian networks and dynamic Bayesian networks. One challenge is certainly the demanding computational complexity. Results of this research are applied to speech and image processing problems, e.g., single channel source separation, multipitch tracking, and multiple object tracking.
The intention of the project is to join research activities in the field of advanced audio processing. The central goal is to strengthen and augment the cooperation between academia and economy. The link between computationally demanding algorithms for audio signal processing and the ability to develop real-time systems is sought after within many innovative application fields that are tackled by the industrial partners, like professional audio and communication technologies, automotive, and entertainment systems. The expected results can be implemented in systems for in-car-communications, dictation and teleconferencing, as well as professional headphones and loudspeakers, and casino gaming machines.
Today the accurate and safe determination of position and time information using GNSS has become an essential part in our society. Unfortunately, the more valueable a resource becomes to our civil infrastructure the more criminals or malicious agents seek to discover and exploit weaknesses in order to disrupt legitimate users or to perpetrate fraud. While the signal authentication necessary to secure the system against such attacks is available for military and government use (depending on the GNSS system), there is no such security function for civilian applications.
Das Projekt GreenPArk instrumentalisiert die digitale Signalverarbeitung zur Steigerung des Wirkungsgrades von HF-Leistungsverstärkern in Mobilfunk-Basisstationen. Dazu werden geschaltete Verstärker unter Verwendung neuartiger digitaler Modulationsverfahren und Signalverarbeitungsmethoden untersucht. HF-Leistungsverstärker in Mobilfunk- Basisstationen, die mit intelligenten Algorithmen und neuartigen Architekturen ausgestattet werden, haben alleine in der Steiermark ein Energieeinsparungspotential von über 21 Millionen kWh pro Jahr, was dem Jahresstromverbrauch von zirka 4900 3-Personen- Haushalten entspricht. Ziel des 2-jährigen Projekts GreenPArk ist die Realisierung solcher Algorithmen, um das Einsparungspotential für die Steiermark nutzbar zu machen.
Verteilte Signale und Daten werden in Zukunft von zentraler Bedeutung für viele Bereiche des täglichen Lebens sein. Vernetzte Sensoren und verteilte Daten erlauben ein verbessertes Verständnis unserer Welt und ihre nachhaltige Nutzung. Um diese großen Datenmengen in nützliche Information zu verwandeln, sind bahnbrechende wissenschaftliche Erkenntnisse am Schnittpunkt von Mathematik, Signal- und Informationsverarbeitung, Nachrichtenübertragung und Scientific Computing erforderlich. Wir werden neue Theorien, Algorithmen und Implementierungen entwickeln, die die Extraktion, Kompression, Übertragung und Speicherung von großen verteilen Datenmengen erlauben. Der Schwerpunkt liegt auf verteilten Architekturen, die fehlertolerant und skalierbar gestaltet werden können. Die Ergebnisse dieser Grundlagenforschung sind in Sensor- und Kommunikationsnetzen, verteilten Systemen, kooperativen Mobilfunksystemen, maschinellem Lernen, dem Entwurf von Embedded Systems und der molekularen Biologie anwendbar.
The area of passive UHF RFID is mostly a niche application for tracking of small goods. Accurate localization of tagged objects could be beneficial in numerous applications, such as warehouse and point-of-sale portals, salesrooms, or archives. Although there has been considerable research on this issue since 2005, accurate positioning remains elusive. There are two major reasons for this: Severe multipath propagation due to the backscatter nature (degenerate channels) and the portal setup (resembling industrial environments) is the dominant source of errors. In combination with limitations enforced by the design of passive UHF RFID (low-power, low-complexity tags; high throughput of tags in portals), this makes ranging in passive UHF RFID a very challenging task.
The main objective of the Action is to combine previously unexploited techniques with new theoretical developments to improve the assessment of voice for as many European languages as possible, while acquiring in parallel data with a view to elaborating better voice production models.
This poject explores security enhanced speaker verification and identification systems based on speech signal watermarking. The goal is to detect several situations where a playback speech, a synthetically generated speech, a manipulated speech signal or a hacker trying to imitate the speech is fooling the biometric system. One issue is to determine whether biometrics (i.e. speaker analysis) and watermarking can coexist simultaneously minimizing the mutual effects.
The main objective of the SoftGNSS project is the development of a software defined Global Positioning System (GPS) receiver whose performance is enhanced by a dual-frequency approach. The combined processing of the L1 and L2c GPS frequency allows for mitigating measurement errors, as for example errors caused by distortions introduced in the Ionosphere. An improved receiver accuracy makes the application of GPS beneficial for an even wider range of applications as compared with todays performance obtained through a single frequency approach. The software defined nature of the system facilitates the adaptation of the receiver to prospective GPS specifications and future Global Navigation Satellite Systems (GNSS), i.e. Galileo, altogether. The receiver comprises a RF front end, a digital signal processing unit and a position, velocity and time (PVT) module. The RF front end receives the GPS satellite signals, modulates the received signal to an intermediate frequency and digitizes it. The digital signal processing unit screens the signal for visible satellites, compensates each satellite signal for delay and frequency shifts and extracts the raw data. In the next stage, the PVT module computes the satellite and receiver position from the raw data while utilizing different error correction algorithms.
The here presented studies were carried out within the basic project Robust, part of COAST. Robust deals with robust speech pre-processing for speech recognition. The development of a new method for source separation is the task within the speech enhancement module of Robust.
A notorious challenge for automatic speech recognition is the significant decrease of recognition rates encountered under non-ideal acoustic environments. The presence of background noise or of con-current speech from speakers other than the target speaker greatly impairs speech recognition performance. A further obtrusive influence is due to varying recording conditions (diverse noise sources, microphone position, etc.). This base project aims at providing defined and stable signal quality for speech as a precondition for robust speech recognition. This includes the suppression of background noise and of speech of interfering speakers, both being a frequent cause of reduced recognition performance. In addition to noise reduction methods we will primarily investigate new methods for the separation of concurrent acoustic sources, like blind source separation, or, beamforming using multiple microphones. Project targets:
Over the last decade, Bayesian networks have become the method of choice for representation of uncertainty in machine learning. Bayesian networks are used in many research areas such as bioinformatics, computer vision, speech recognition, error-correcting coding theory, and artificial intelligence. Currently, the research is focused on two main issues. First, much work is devoted to finding more efficient approximate inference algorithms. Second, there has been much interest in learning the parameters and the structure of Bayesian networks from data. Basically, there are two main paradigms for learning in the machine learning community: generative and discriminative learning. There is a strong belief in the scientific community that discriminative classifiers have to be preferred in reasoning tasks. The aim of the proposed research is to work on discriminative structure and parameter learning methods for Bayesian networks and to propose conditions for discriminative structures to be sufficient even trained only with maximum likelihood parameter training. Additionally, we want to perform an extensive experimental comparison between the developed discriminative approaches and well known generative methods. For the experiments, we want to use data sets from the UCI repository and from a surface inspection task available at our institute.
The complexity of RFID systems has been increasing continuously. New applications are emerging, where the tags are extended by arbitrary sensors, the collection of data from low class tags, the communication between the tags and the support of Real-Time Localization Systems (RTLS). These new applications require active RFID tags, where a battery powers the tag. New communication techniques have to be evaluated for these tags to satisfy their requirements. Active RFID tags are currently extremely expensive so the focus is on simple and low complexity techniques to reach new market segments. The tags have to operate in highly multipath intensive environments, where the signal is severely distorted. The project investigates radio air-interface technologies for active RFID-RTLS. New Communication methods like Ultra Wideband (UWB) are expected to fulfill the requirements of such systems, because UWB enables the tracking of goods with cm accuracy, shows good robustness against multipath fading, and enables very low power transmission. However, most of the known receiver architectures show high complexity and are not applicable in RFID. A goal of this work is an analysis of UWB receiver architectures, their usability in active RFID systems and the development of new suboptimal low complexity UWB architectures. The transceiver architectures, signaling schemes and positioning techniques based on UWB will be compared with other state of the art communication technologies.
Within this project the use of advanced speech recognition technology for telehealth or telecare applications is evaluated. State of the art video-care systems, e.g. BETAVISTA from Zydacron connect care service providers, such as hospitals, doctors, nurses and nursing homes with their patients or clients enabling daily monitoring and counsel to take place effectively. The communication hardware connects with the patient and their medical devices and retrieves the patients data from their home and transfers it to the service provider. A complete solution from the patient’s location via any available network to the service provider is offered. The concepts of the research direction of efficient, accurate and convenient information input via speech, originating from the base research projects ALSO, INSPIRATION und ROBUST are transferred and integrated into the telecare application for capturing the instructions of the service providers automatically and transferring the contained detailed information into the existing data base. Capturing detailed structured information enables on the one hand direct access to historic instructions and allows on the other hand a monitoring the input for out of range values or values of unexpected nature. Content-related aspects:
The main aim of the project ALSO is to expand speech recognition systems toward a speaker and paragraph-specific parametrization and automatic adaptation (of parameters), so that recognition becomes more exact; to enable more efficient implementation and to achieve greater acceptance among users. The development of new tools for improved training is an essential part of speech recognition systems based on available data, which depict and describe the field of application exactly. This enables the user to be trained for an existing system and concurrent application, deploying user-friendly and available means. One the one hand, high initial recognition rates and improved running adaptations can be attained, whilst also assuring a broader field of applications. Content-related aspects: (1) There is an evaluation to see how individual parameters influence and specific models influence recognition performance. Which of these parameters or models bear any relevance and whether they contain sufficient potential for improvement, will be investigated. This is done manually; i.e. the models are tested against the parameters in terms of potential for improvement. Only the most promising parameters will continue to be tested. (2) Selection of optimal learning procedures for these parameters (3) Investigation how these parameters can be automatically detected and selected and whether a drop in performance must be taken into account. (4) Integration into the current system with fully automated adjustment or appropriation for a particular user-interface. This should be done so that professional expertise is not required to adjust parameters to user-specific settings.
The speech communication channel for flight control between Pilot and Tower is used to transmit additional data such as a flight number. The data should be available at the screen of the flight controller to gain additional confidence about the identiy of the communication partner. The problem is approached by using watermarking techniques, where data is embedded into a host signal without being perceptable for the receiver.
PROACT has the short-term goal to stir increased interest and cooperation in the research area of contactless identification technology. The medium-term goal is to establish Graz as a center of excellence in advanced RFID technology and related fields of research. PROACT has the goal to augment teaching activity for RFID topics and to attract students to specialize in an RFID-related area. An appropriate number of high qualified PROACT graduates should find attractive jobs in the local industry. Another goal of PROACT is to strengthen the collaboration of the local industry with academia. Building on existing expertise in this area, PROACT will benefit from the strong local industrial and academic strengths. It will speed up advances by defining a joint programme for research and teaching activities.
The rapid time variation of mobile radio channels is often modeled as a random process with second order moments reflecting vehicle speed, bandwidth and the scattering environment. These statistics typically show that there is little room for prediction of channel properties such as received power or complex taps of the impulse response coefficients, at least when linear predictor structures are considered. We have used mutual information estimation to measure statistical dependencies in sequences of wideband mobile radio channel data and found significant nonlinear dependencies, far exceeding the linear component. Based on these upper limits for the predictability of channel evolution over time intervals up to 30 ms ahead, we study practical nonlinear predictor systems using Multivariate Adaptive Regression Splines (MARS) and Quadratic Volterra Filters.
The Christian Doppler Laboratory for Nonlinear Signal Processing addresses fundamental research questions arising from signal processing applications which are challenging due to their nonlinear aspects. We deliver theoretical analyses, develop and optimize new algorithms and, through their implementation, build awareness for their complexity, robustness, accuracy, and power consumption trade-offs. The Christian Doppler Laboratory for Nonlinear Signal Processing plays a leading role in the solution of signal processing problems where conventional methods fail. By entering into industrial partnerships, it thrives from and supports the bidirectional exchange of know-how and people between nonlinear science and the sweeping digital signal processing revolution.
High-frequency fast frequency-hopping systems require frequency synthesizers to provide multi-gigahertz clocks with a band switching time on the order of few tens of nanoseconds, posing difficult challenges with respect to noise, sidebands, and power dissipation. Conventional phase-locked loop (PLL)-based synthesizers are simply ill-suited due to the long settling times, which are typically tens of microseconds. Recent research has pushed the development of digital-based low-noise high-frequency synthesizers where the traditional analog forward path is replaced by a digital processing core and the VCO is replaced by a Digitally Controlled Oscillator (DCO). The advantages of such architectures include: friendly implementation in newest digital CMOS technologies, improved testability, robustness against PVT variations, low sensitivity to external noise sources, enhanced programmability. Since the frequency control information is stored in digital form in the loop and the DCO can be switched within few nanoseconds from one frequency to another. There are quite a number of aspects which have to be deeply investigated in a feasibility study. These include: 1. Digital phase detector topologies 2. Digital loop filter topologies 3. DCO architectures 4. Phase noise performance 5. Limit cycles and spurs in the spectrum due to the quantization of the phase information 6. How to assure a virtually zero locking time when switching bands 7. Number of supportable bands 8. Area and power consumption estimation
High-frequency fast frequency-hopping systems require frequency synthesizers to provide multi-gigahertz clocks with a band switching time on the order of few tens of nanoseconds, posing difficult challenges with respect to noise, sidebands, and power dissipation. Conventional phase-locked loop (PLL)-based synthesizers are simply ill-suited due to the long settling times, which are typically tens of microseconds. Recent research has pushed the development of digital-based low-noise high-frequency synthesizers where the traditional analog forward path is replaced by a digital processing core and the VCO is replaced by a Digitally Controlled Oscillator (DCO). The advantages of such architectures include: friendly implementation in newest digital CMOS technologies, improved testability, robustness against PVT variations, low sensitivity to external noise sources, enhanced programmability. Since the frequency control information is stored in digital form in the loop and the DCO can be switched within few nanoseconds from one frequency to another. There are quite a number of aspects which have to be deeply investigated in a feasibility study. These include: 1. Digital phase detector topologies 2. Digital loop filter topologies 3. DCO architectures 4. Phase noise performance 5. Limit cycles and spurs in the spectrum due to the quantization of the phase information 6. How to assure a virtually zero locking time when switching bands 7. Number of supportable bands 8. Area and power consumption estimation At the moment there are no publications or public documents available which report the implementation of fast frequency-hopping systems with a digital synthesizer. Goal of this Laboratory Module is to investigate in depth the feasibility of a digital approach to the frequency synthesis of fast frequency-hopping systems.
In recent years the rapid growth of the number of users in mobile communication networks led to the development of third generation standards like UMTS. The modulation and the multiple user access methods where designed for high spectral efficiency. This leads to strong fluctuations of the power envelope transmitted by the UMTS Base-Stations and therefore to nonlinear effects caused by power amplifiers. Because these devices are the most cost intensive, it is desirable to operate the amplifiers close to their compression points. The main problem is the pronounced dynamic nonlinear behaviour of the amplifier, combined with fluctuations in the envelope in the transmission signal. Several state-of-the-art methods like Feed-Forward are used in today’s power amplifiers, but are expensive hardware items. The goal of this work is the investigation of more flexible and powerful linearization methods called Digital Predistortion. This method aims at inverting the dynamic nonlinearity of the whole transmitter chain in the digital baseband.
The goal of the project is the digital correction of analog signal processing errors in fast analog-to-digital converters. Through this digital correction of errors, costs for production of fast converters should be limited and a more flexible adaption of new technologies will be allowed. An analog-to-digital converter is a complex system that causes dynamic, nonlinear, and time-variant errors. In order to determine analog signal processing errors typical high-speed architectures are investigated. The aim of the investigations is the systematic identification of these architectures and their influence on the ideal signal conversion. Identification is achieved through theoretical descriptions, simulation models and measurements of real converters. For all identified systems algorithms will be developed which improve the properties of the converter. These algorithms will be evaluated according to performance and practical applicability.
The research is concerned with the identification and inversion of weak nonlinear behaviour occurring in analog integrated circuits for ADSL applications. The nonlinear behaviour induced by the nonlinear characteristics of the analog components of the integrated circuits, is limiting the performance of the overall ADSL data transmission system. Thus, the goal is to compensate the inherent nonlinearities of the circuit. This nonlinear equalization should be realized in the digital domain, through adaptive nonlinear filters. Nonlinear system identification serves as a starting point for the analysis of the inversion of nonlinear systems. Through the identification of a discrete time nonlinear model of the analog circuit, a thorough analysis of the adaptive nonlinear equalization is made possible. Furthermore, the digital nonlinear model can then be implemented in a discrete time simulation platform for the evaluation of the achievable transmission rate of the entire ADSL system.
Speech recording is a common practice in daily professional activities, such as for lawyers, physicians, journalists and architects, among others. The combination of dictation systems with automatic speech recognition (ASR) is being demanded today as the natural procedure to take over their daily transcription routines. However, in those working environments (e.g. hospital, court of law, street, etc.), it is not always possible to record in silent or noise-free conditions, this fact causing ASR to become unreliable. The researchers in oneVoice have developed several novel signal processing-based techniques for analyzing speech with natural intonation. These methods represent the scientific basis of the project outcome, namely, a new single-channel speech enhancement/coding system that removes the background interferences present in the recording.
In emergency situations, particularly within smoke filled, partially or completely collapsed large buildings, communications with rescue personnel can be difficult. Safety & co-ordination of the operations is hampered by a lack of knowledge of the location of emergency staff. The project will investigate & demonstrate the use of UltraWideBand (UWB) radio, to allow the precise location of personnel to be measured & displayed in a control centre & simultaneously improve communications reliability. The feasibility of using UWB to search for survivors in smoke filled rooms or buried beneath rubble & to generate simple maps will also be investigated.
Ultra-Wideband (UWB) communications is an emerging new technology for high speed data transmission systems that is expected to enable low-cost and low-power devices. Instead of a modulated carrier, streams of ultra-short pulses (< 1ns) are used for wireless data transmission, yielding signals of huge bandwidths (> 1 GHz) but at very low power densities. In principle, the nature of the signal used makes the technology suitable for low-cost implementations in standard CMOS technology. However, before UWB systems can be produced at large scale and low cost, there are numerous open research issues to be solved. Only in recent years, the academic world has started research activities on a broad front, standardization and regulation authorities have become aware of the technology, and joint task-groups have been founded in the European Union and in the US. Previous experience with UWB technologies exists from military applications like UWB (time-domain, impulse) radar systems. Still research at fundamental and applied levels is needed at large scale to make cheap and power-efficient UWB chips available. In our research, we plan to go beyond the state-of-the-art in several areas related to the transceiver architecture and signal processing. Due to the extremely large bandwidth, which prevents direct sampling of the received signal at sufficient accuracy, it is expected that a straight-forward downscaling of signal processing algorithms for conventional receivers will not lead to practical solutions for UWB devices. That is, new algorithms for channel estimation, synchronization, multi-user detection, and other typical receiver tasks have to be developed for UWB devices. This includes the derivation of appropriate system models including the modeling of the multi-path radio channel as observed through antenna arrays. The following items will receive special attention: * Research on UWB channel models which include the multi-input/multi- output case * Research on UWB transceiver algorithms including adaptive antenna array algorithms and fading prediction * Optimization of transceiver algorithms for efficient hardware implementation
SonEnvir ist ein vom steirischen Zukunftsfond gefördertes Forschungsprojekt mit dem Ziel Sonifikation und ihre Anwendungen in verschiedenen wissenschaftlichen Disziplinen zu erforschen. Viele wissenschaftliche Forschungsgebiete arbeiten mit komplexen, multidimensionalen Daten. Die üblichen Verfahren, innere Strukturen dieser Daten darzustellen, sind Visualisierung und statistische Analyse. Beide Ansätze sind anerkannt, haben aber bekannte Nachteile: Visualisierung ist durch die perzeptuellen Schwächen des Sehsinns begrenzt (schlechte zeitliche Auflösung, nur wenige Dimensionen darstellbar), und Statistik durch das mathematische Verständnis des Forschers, was die Komplexität der Verfahren betrifft - und deren Bedeutung für die zu analysierenden Daten. Sonifikation ist die Repräsentation und Analyse von Daten durch Klang und bietet eine zukunftsweisende Alternative und Ergänzung zum visuellen Modus. Während in den letzten 20 Jahren Sonifikation erfolgreich auf konkrete Einzelprobleme angewandt wurde, stellt SonEnvir den ersten generischen Ansatz dar Sonifikation als fachübergreifendes Analyse- und Darstellungsverfahren zu etablieren. SonEnvir berücksichtigt erstmals alle relevanten Gebiete gleichermaßen:
The goal of the proposed research is the development of a new and efficient source coder for speech and audio signals based on the approach of coding in the perceptual domain. In this approach the signal is transformed into an auditory representation by passing it through a model of the human peripheral auditory system. The auditory representation is quantized and encoded for an efficient digital transmission or storage. Upon decoding the auditory representation is then transformed back into the acoustic domain using an inverse of the auditory model. Auditory modeling and research on perceptual-domain coding provides insight into human perception and facilitates the extraction of signal features that are most relevant to the listener. The gained findings not only yield a new coding method for transmission and storage but importantly assist the development of next-generation hearing aids and cochlear implants. The interdisciplinarity of perceptual-domain coding calls for consultation and cooperation with experts from information theory as well as hearing physiology. In collaboration with Professor Bastiaan Kleijn and his research group in Stockholm, an optimum quantizer for the encoding of auditory representations should be designed. By the cooperation with Professor Roy Patterson in Cambridge, a more accurate auditory model should be investigated and incorporated into the perceptual-domain coder.
The SPARC (Semantic Phonetic Automatic ReConstruction) project aims at automatically reconstructing the original wording of a medical dictation from its formatted, corrected written form and the error-prone output of a speech recogniser. Normally, either of these two texts alone is not sufficient to obtain a literal transcription, since the written report may contain reformulations of the original utterance and the recogniser output misrecognitions. In the SPARC approach, the resources are now combined and a semantic and phonetic analysis is performed on the texts to resolve the mismatches between them. This way, the available large corpora of audio recordings of the dictations, draft recognitions, and corresponding final medical reports can be used for improving current text production systems using automatic speech recognition. Furthermore, SPARC is also supposed to give insights into the processes involved in manual transcription of dictations, which may allow further automation in large scale text production environments.
The SNOW project aims to support nomadic workers in their performance of maintenance and production tasks. It is developing a multimodal interface enabling workers to interactively access documentations via different input modes such as speech, gestures or handwriting using mobile devices in the field. At TUG particularily the speech input and output modality are strengthened by denoising and enhancement algorithms for speech in harsh acoustic environment.
Multimedia data has a rich and complex structure in terms of inter- and intra-document references and can be an extremely valuable source of information. However, this potential is severely limited until and unless effective methods for semantic extraction and semantic-based cross-media exploration and retrieval can be devised. Todays leading-edge techniques in this area are working well for low-level feature extraction (e.g. colour histograms), are focussing on narrow aspects of isolated collections of multimedia data, and are dealing only with single media types. MISTRAL follows the following lines of radically new research: MISTRAL will extract a large variety of semantically relevant metadata from one media type and integrate it closely with semantic concepts derived from other media types. Eventually, the results from this cross-media semantic integration will also be fed back to the semantic extraction processes of the different media types so as to enhance the quality of the results of these processes. MISTRAL will focus on most innovative, semantic-based cross-media exploration and retrieval techniques employing concepts at different semantic levels. MISTRAL addresses the specifics of multimedia data in the global, networked context employing semantic web technologies. The MISTRAL results for semantic-based multimedia retrieval will contribute to a significant improvement of todays human-computer interaction in multimedia retrieval and exploration applications. New types of functionalities include but are not limited to o cross-media-based automatic detection of objects in multimedia data: For example, if a video contains an audio stream with barking together with a particular constellation of video features, the system can automatically consider the features in the video as an object dog.
Goal of the proposed research is the development of new efficient methods for the identification of the Input-output (i/o) behavior of nonlinear dynamical systems. The efficiency in terms of computational complexity should be achieved by exploiting the structural constraints of the nonlinear dynamical system. An accurate description of the i/o behavior of nonlinear dynamical systems such as nonlinear circuits gains in relevance for research and industrial applications. The linearization of nonlinear systems through their inverse is one example of an important area of application. To realize such applications it is necessary to be able to map the i/o behavior of the nonlinear system to a low dimensional representation. All known methods, such as Volterra series, suffer from the problem of exploding computational complexity with the required accuracy of the system description. One reason for this increase is that important structural constraints of the nonlinear system are not accounted for. In die proposed research these structural information such as dimension of die state space, poles of the linearized dynamics, should be used to circumvent.such an increase in computational complexity. In collaboration with Professor Chua and his research group, these structural information should be incorporated into a new general methodology for the identification of nonlinear systems.
In steel industry there is an increasing demand for automatic inspection systems to control the quality of products. Through the economic pressure on the supplier to industry the inspection of a few samples from the production lot is insufficient. Especially, in car industry a complete, reliable, and automatic surface inspection is necessary. Hence, there is huge demand for vision based quality control systems in industry. The aim of the research project is to develop sophisticated methods for evaluating the surface quality of steel blocks. This means that irregularities have to be detected reliably. Further, they have to be classified as erroneous or as non-problematic. Due to the fact that an acceptable intensity image cannot be produced with intensity imaging the investigations are restricted to range imaging. The 3-D model of the surface is acquired be means of the light sectioning methods. The proposed research comprises of two key activities. Firstly, suitable features have to be investigated which represent the characteristics of the range data well. These descriptors are restricted to characterize the planar curves of the cross-sections. Secondly, the features are to be combined in order to locate the irregularities embedded in the surface data and further to decide between flawed and intact surface segment. There exists a huge variety of different classification algorithms. Special attention will be dedicated to Hidden Markov Models represented as Bayesian network. This model considers a sequence of random variables that are dependent on previous values. The issue of decision-making by means of probabilistic networks is a very fundamental approach which might be useful for many intelligent systems.
The main objective of this Action is to improve the quality and capabilities of the voice services for telecommunication systems through the development of new nonlinear speech processing techniques. The proposed new mathematical methods are expected to provide advances in generic speech processing functions. Examples of these are: higher quality speech synthesis, more efficient speech coding, improved speech recognition, and improved speaker identification. It is envisaged that the proposed nonlinear processing techniques will significantly facilitate the acceptance of voice interfaces for information systems such as the mobile Internet (by improving synthesis and recognition). Additionally, these techniques are expected to make significant contributions to the improvement of future generations of speech coders in wireless networks (including packet-based wireless networks).
In the steel industry there is an increasing demand for automatic inspection systems to control the quality of products. Through the economic pressure on the supplier to the industry, the inspection of a few samples from the production lot is insufficient. Especially in the car industry, a complete, reliable, and automatic surface inspection is necessary. The aim of the research project is to develop sophisticated methods for evaluating the surface quality of steel products. This means that irregularities have to be detected reliably. Further, they have to be classified as erroneous or as non-problematic. Due to the fact that an acceptable intensity image cannot be produced with intensity imaging the investigations are restricted to range imaging. The 3-D model of the surface is acquired by means of light sectioning methods. The research comprises two key activities. Firstly, suitable features have to be investigated which represent the characteristics of the range data well. Secondly, the features are to be combined in order to locate the irregularities embedded in the surface data and further to decide between flawed and intact surface segment. A huge variety of different classification algorithms exist. Special attention will be dedicated to Hidden Markov Models represented as Bayesian network. The issue of decision-making by means of probabilistic networks is a very fundamental approach which might be useful for many intelligent systems.
COMMIT is ftw.’s part of the EUREKA/Medea+ project INCA Integrated Copper Network Access (Medea+ project proposal number A106). Medea+ is a pan-European cooperation program to promote chip-manufacturing technologies, with some focus on system-on-chip (SOC) development. The INCA project develops chipsets for broadband wireline systems, supporting technologies and methodologies, and prepares for future broadband products. COMMIT is responsible for delivering functionality descriptions of near-future products, researching product enabling key technologies for voice and data transport over IP, algorithms for radio-frequency interference (RFI) rejection, and specifying the technical impact of the unbundling of the local loop. Attention is to be given to the convergence of wireline and wireless communication and products specified are to support such a development
The aim of the project is to develop a highly effective acoustic user interface for visually impaired and blind people. To improve the usability over commonly used “screen readers”, 3-dimensional sound simulation is employed to simulate surrounding acoustic rooms via headphones. To create those virtual rooms, the Ambisonic approach was chosen and will be implemented on the TI DSP Evaluation Module. The project will result in a test system which is designed to replace an operating-systems desktop based on audio cues only. Similar to the graphical desktops, the acoustic rooms can contain icons which result in actions if clicked on. This might cause a program to start or open a text file. User are able to navigate in the environment with a joystick, they can “go” from one room to another, where the room acoustic differs among the rooms and is an additional information for the user. For visually impaired and blind people, this might be a remarkable improvement in computer access which is important to empower them for equal participation in the labor market. not assigned KP: ISIS project (Integration, Service, Information and Education), Austrian Bundessozialamt
The research is concerned with the identification and inversion of weak nonlinear behaviour occurring in analogue integrated circuits for ADSL applications. The nonlinear behaviour induced by the nonlinear characteristics of the analog components of the integrated circuits, is limiting the performance of the overall ADSL data transmission system. Thus, the goal is to compensate the inherent nonlinearities of the circuit. This nonlinear equalization should be realized in the digital domain, through adaptive nonlinear filters. Nonlinear system identification serves as a starting point for the analysis of the inversion of nonlinear systems. Through the identification of a discrete time nonlinear model of the analog circuit, a thorough analysis of the adaptive nonlinear equalization is made possible. Furthermore, the digital nonlinear model can then be implemented in a discrete time simulation platform for the evaluation of the achievable transmission rate of the entire ADSL system.
Loss of water due to leaks in pipes is a significant problem for communities in Austria and Southern Europe. Presently, such leaks are localized by noise measurement and auditory assessment. This method requires experienced staff and results in substantial localization errors due to high background noise levels. We are investigating advanced signal processing methods which allow the suppresion of background noise in ground microphone measurements. Various time-frequency processing methods are studied in MATLAB as well as their implementation using C and single-chip DSPs.
The COST 258 Action with the title “The Naturalness of Synthetic Speech” is concerned with the coordination of basic research of 34 laboratories dealing with text-to-speech synthesis in 17 European countries.COST Action 258 proposes a range of studies that address the core issues of naturalness in synthetic speech in concrete applications. Our contribution addresses prosodic models for the German language, SRELP based demisyllable synthesis using VieCToS, and nonlinear oscillator models for signal generation. not assigned GG: not assigned KP: 34 European laboratories (see COST258 homepage)
Due to the increasing use of mobile phones and future integrated devices (GPRS, UMTS terminals), there is a growing need for mobile access to information services. Spoken language interfaces are increasingly important because the mobile devices don’t feature comfortable keyboards for input, and also because the users’ hands are not always free for device operation. One precondition for spoken language interfaces is robust speech recognition which can handle regional variants. The project has already created two spoken language databases which cover the regional variants of the German language as spoken in Austria, recorded over the fixed and mobile telephone networks, respectively. A second focus of the project is on the integration of spoken language interfaces in applications that will run on mobile devices. We have developed a spoken language dialog system for Austrian postal rates. Future demonstrators will explore flexible combinations of spoken language interfaces with other input devices, in particular with pen-input via touch-screens. Currently there are intensive efforts on a global scale to create standardised components and interfaces for applications using spoken language. The project tracks these standardizations (e.g. VoiceXML, DSR, JSAPI, MATE) and related market trends (Voice Web) and tests them in demonstrators. not assigned KP: 5 industrial cooperation partners within ftw.
Antropomorphic signal processing develops computational models of human communication modalities that emulate the physiological processes of their natural counterpart. Widely known examples are found in articulatory models for speech synthesis and hearing models for recognition. In speech and audio coding, the decoder’s task is to synthesize signals that evoke the same auditory response as the original signal, independent of its source. While a lot is known about human audition and the related neural code, resynthesis of audible waveforms from such code has been achieved only recently. We develop one such auditory model inversion approach and investigate its application to speech and audio coding. It exhibits surprisingly low sensitivity to amplitude quantization errors and to random channel erasures as would be encountered during transmission over heterogenous communication networks where a specific quality of service is hard to guarantee.
The anticipated convergence of wireless communications and the internet demands for ever increasing data rates on radio air-interfaces for short range indoor as well as medium to wide range outdoor communications systems. This research activitiy aims at the development of key technologies and know how for the design of the radio links of future high-speed mobile systems. The characterization of the mobile radio channel is one of the key requirements for a successful system design. Channel measurement techniques and channel models are investigated in this project. On the transmission side, the focus lies on advanced signaling techniques, considering spread spectrum, OFDM (orthogonal frequency devision multiplexing), and UWB (ultra wideband) radio technologies. Here the primary goal lies in the development of signal processing algorithms that are needed for the robust detection of the wideband data streams in the presence of multipath, noise, multi-user interference, and other disturbances.