Databases and Tools

Intelligent Systems

ABESPN Artificial Bandwidth-Extension with Sum-Product Networks

This package performs training and inference for ABE, using an HMM/SPN-model as described in [1]. It is based on Matlab, but calls C++ binaries for training and inference of SPNs. The core routine for training is the Poon & Domingos algorithm for training SPNs [2], which was ported to C++ here.

By downloading and using this package you agree to the terms in the file LICENSE.txt.

[1] R. Peharz, G. Kapeller, P. Mowlaee and F. Fernkopf,
"Modeling Speech with Sum-Product Networks: Application to Bandwidth Extension",
ICASSP, 2014.
[2] H. Poon and P. Domingos,
"Sum-product networks: A new deep architecture",
UAI, 2011, pp. 337–346.

LatentSPN On the Latent Variable Interpretation in Sum-Product Networks

This package reproduces the experiments in the paper

Robert Peharz, Robert Gens, Franz Pernkopf and Pedro Domingos,
"On the Latent Variable Interpretation in Sum-Product Networks",
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI),
accepted for publication, 2016.

To get started, unzip everything into some folder and consult README.txt.

Please note the attached LICENCE file.

MLBNSVM Training Maximum-Likelihood Bayesian Network SVM

This Matlab package implements the algorithm proposed in Robert Peharz, Sebastian Tschiatschek and Franz Pernkopf, The Most Generative Maximum Margin Bayesian Networks, ICML 2013.


MM-GMMs Large Margin Learning of Gaussian Mixture Models

In our ECML 2010 paper we present a discriminative learning framework for Gaussian mixture models (GMMs) used for classification based on the extended Baum-Welch (EBW) algorithm. We suggest two criteria for discriminative optimization, namely the class conditional likelihood (CL) and the maximization of the margin (MM). In the experiments, we present results for synthetic data, broad phonetic classification, and a remote sensing application.

Synthetic spiral data: (a) generative GMM, (b) CL-GMM, (c) MM-GMM, and (d) decision boundary of all learning approaches.
MMBN Maximum Margin Bayesian Network Classifiers

Classification is an important task in machine learning. It deals with assigning a given object to one of a number of different categories. We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient method for optimization to solve this task. In contrast to previous approaches, we maintain the normalization constraints of the parameters of the Bayesian network during optimization, i.e. the probabilistic interpretation of the model is not lost.

NMFL0 NMF with l0-sparseness constraints

Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the l1-norm of the factor matrices, while little work has been done using a more natural sparseness measure, the l0-pseudo-norm. In the paper "Sparse nonnegative matrix factorization with l0-constraints", we propose a framework for approximate NMF which constrains the l0-norm of the basis matrix, or the coefficient matrix, respectively.


Speech Communication

AMISCO The Austrian German Multi-Sensor Corpus

AMISCO is a collection of multi-room and multi-channel close- and distant-talking Austrian German high-quality speech-recordings from 24 speakers, balanioced male and female. It contains around 8.2 hours of read speech, 53,000 word tokens based on 2,070 unique word types. This corpus features glottograms, fundamental frequencies, positions, and video recordings of speakers located at certain positions or walking along trajectories provided by the Kinects’ skeleton tracker.

ATCOSIM ATCOSIM: Air Traffic Control Simulation Speech Corpus

The ATCOSIM Air Traffic Control Simulation Speech corpus is a speech database of air traffic control (ATC) operator speech, provided by Graz University of Technology (TUG) and Eurocontrol Experimental Centre (EEC). It consists of ten hours of speech data, which were recorded during ATC real-time simulations using a close-talk headset microphone. The utterances are in English language and pronounced by ten non-native speakers. The database includes orthographic transcriptions and additional information on speakers and recording sessions.

ELHE ELHE: Austrian German Parallel Electro-Larynx -- Healthy Speech Corpus

We present the first parallel electro-larynx -- healthy speech corpus for Austrian German:

  • 7 speakers , male and female, different social and regional backgrounds
  • read speech 
    6030 utterances, 19 510 words


EXTRA Example-based automatic phonetic transcription

Phonetic transcriptions are an important resource in different research areas such as speech recognition or linguistics. Establishing phonetic transcriptions by hand is an exhausting process therefore it seems reasonable to develop an application that automatically creates phonetic transcriptions for given audio data. Current state-of-the-art systems for automatic phonetic transcription (APT) are mostly phone recognizers based on Hidden Markov models (HMMs). We present a different approach for APT especially designed for transcription with a large inventory of phonetic symbols. In contrast to most systems which are model-based, our approach is non-parametric using techniques derived from concatenative speech synthesis and template-based speech recognition. This example-based approach not only produces draft transcriptions that just need to be corrected instead of created from scratch but also provides a validation mechanism for ensuring consistency within the corpus.

GRASS GRASS: the Graz corpus of Read And Spontaneous Speech

We present the first large scale speech database for Austrian German:

  • 38 speakers , male and female, different social and regional backgrounds
  • read speech
    2 744 utterances, 19 510 words
  • read and elicited commands
    1 710 utterances, 3 853 words
  • spontaneous conversations
    48 960 utterances, 276 000 words
PTDB-TUG PTDB-TUG: Pitch Tracking Database from Graz University of Technology

The Pitch Tracking Database from Graz University of Technology (PTDB-TUG) is a speech database for pitch tracking that provides microphone and laryngograph signals of 20 English native speakers as well as the extracted pitch trajectories as a reference. The subjects had to read 2342 phonetically rich sentences from the existing TIMIT corpus. This text material is available spoken by both, female and male speakers. In total, this database consists of 4720 recorded sentences.

Tracking example: (i) Spectrogram of speech mixture with reference trajectories; (ii, iii, iv) Estimated pitch trajectories

Nonlinear Signal Processing

EPR Error Power Ratio

 This page contains material related to our paper on the error power ratio (EPR) [1].

[1] "A Noise Power Ratio Measurement Method for Accurate Estimation of the Error Vector Magnitude", K. Freiberger, H. Enzinger, and C. Vogel, submitted to IEEE Transcations on Microwave Theory and Techniques, Sep. 2016

VOLTERRA Fast Time-Domain Volterra Filtering Implemented in C

This is the source code produced for the paper "Fast Time-Domain Volterra Filtering", presented at the Asilomar Conference on Signals, Systems and Computers, 2016. It includes 16 implementations of time-domain Volterra filters together with testbenches for the verification of correctness and runtime. The 16 implementations result from the combination of 4 methods of traversing (nested-loop, combinatoric, lookup-table, hard-coded) with 4 methods of computation (direct1, direct2, reuse, horner).

Volterra Kernel - Transparent

Wireless Communications

MeasureMINT UWB Indoor Channel Experimental Data

The MeasureMINT (MINT stands for multipath-assisted indoor navigation and tracking) database contains position-resolved ultra-wideband channel measurements for several different indoor environments. These measurement campaigns were conducted to allow for an evaluation of the MINT indoor localization scheme, but may be of use for any research topic dealing with indoor radio propagation. Measurements were obtained either with a vector network analyzer, hence they are available as frequency domain channel transfer functions, or with an M-sequence radar device, hence time-domain signals are available directly.


PARIS-OSF PARIS Simulation Framework

The PARIS Simulation Framework is a Matlab-based simulator developed by SPSC and NXP Semiconductors in 2007 through 2011. The development 2012 through 2013 (ongoing) was done at the Reynolds Lab at Duke University.

The Framework is designed for research on wideband addons to UHF RFID, such as, for example, ranging and localization. In contrast to other UHF RFID Simulators, it is specifically designed to handle (ultra)wideband signals, fading channels, as well as nonlinearities and detuning of tags.

TUG-EEC-Channels The TUG-EEC-Channels Database V. 1.1

The TUG-EEC-Channels database consists of a collection of recordings of voice radio transmissions, which were generated during flights with a general aviation aircraft. Maximum length sequences (MLS) were transmitted over the voice channel of an amplitude modulation (AM) aeronautical VHF radio and the received signals were recorded. The measurements cover a wide range of typical flight situations as well as static back-to-back calibrations.