Databases and Tools
Artificial Bandwidth-Extension with Sum-Product Networks
This package performs training and inference for ABE, using an HMM/SPN-model as described in . It is based on Matlab, but calls C++ binaries for training and inference of SPNs. The core routine for training is the Poon & Domingos algorithm for training SPNs , which was ported to C++ here.
ATCOSIM: Air Traffic Control Simulation Speech Corpus
The ATCOSIM Air Traffic Control Simulation Speech corpus is a speech database of air traffic control (ATC) operator speech, provided by Graz University of Technology (TUG) and Eurocontrol Experimental Centre (EEC). It consists of ten hours of speech data, which were recorded during ATC real-time simulations using a close-talk headset microphone. The utterances are in English language and pronounced by ten non-native speakers. The database includes orthographic transcriptions and additional information on speakers and recording sessions. It was recorded an annotated by Konrad Hofbauer.
This is a telephone speech database for Austrian German. The databases contain one thousand calls each, from the fixed and mobile telephone network. Speakers were chosen to assure a representative distribution over accent regions, sex, and age groups. The database is compliant with the guidelines of the Speechdat project.
The MATLAB function bibget simplifies the creation of BibTeX databases from IEEE Xplore by providing a simple command-line interface.
ELHE: Austrian German Parallel Electro-Larynx -- Healthy Speech Corpus
We present the first parallel electro-larynx – healthy speech corpus for Austrian German:
Error Power Ratio
This page contains material related to our paper on the error power ratio (EPR) .
Example-based automatic phonetic transcription
Phonetic transcriptions are an important resource in different research areas such as speech recognition or linguistics. Establishing phonetic transcriptions by hand is an exhausting process therefore it seems reasonable to develop an application that automatically creates phonetic transcriptions for given audio data. Current state-of-the-art systems for automatic phonetic transcription (APT) are mostly phone recognizers based on Hidden Markov models (HMMs). We present a different approach for APT especially designed for transcription with a large inventory of phonetic symbols. In contrast to most systems which are model-based, our approach is non-parametric using techniques derived from concatenative speech synthesis and template-based speech recognition. This example-based approach not only produces draft transcriptions that just need to be corrected instead of created from scratch but also provides a validation mechanism for ensuring consistency within the corpus.
Fast Time-Domain Volterra Filtering Implemented in C
This is the source code produced for the paper “Fast Time-Domain Volterra Filtering”, presented at the Asilomar Conference on Signals, Systems and Computers, 2016. It includes 16 implementations of time-domain Volterra filters together with testbenches for the verification of correctness and runtime. The 16 implementations result from the combination of 4 methods of traversing (nested-loop, combinatoric, lookup-table, hard-coded) with 4 methods of computation (direct1, direct2, reuse, horner).
GRASS: the Graz corpus of Read And Spontaneous Speech
We present the first large scale speech database for Austrian German:
Joint Linearity Efficiency Model
This is an implementation of the model presented in the paper “A Joint Linearity-Efficiency Model of Radio Frequency Power Amplifiers”.
Large Margin Learning of Gaussian Mixture Models
In our ECML 2010 paper we present a discriminative learning framework for Gaussian mixture models (GMMs) used for classification based on the extended Baum-Welch (EBW) algorithm. We suggest two criteria for discriminative optimization, namely the class conditional likelihood (CL) and the maximization of the margin (MM). In the experiments, we present results for synthetic data, broad phonetic classification, and a remote sensing application. The experiments show that CL-optimized GMMs (CL-GMMs) achieve a lower performance compared to MM-optimized GMMs (MM-GMMs), whereas both discriminative GMMs (DGMMs) perform significantly better than generatively learned GMMs. We also show that the generative discriminatively parameterized GMM classifiers still allow to marginalize over missing features, a case where generative classifiers have an advantage over purely discriminative classifiers such as support vector machines or neural networks.
Maximum Margin Bayesian Network Classifiers
Classification is an important task in machine learning. It deals with assigning a given object to one of a number of different categories. We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient method for optimization to solve this task. In contrast to previous approaches, we maintain the normalization constraints of the parameters of the Bayesian network during optimization, i.e. the probabilistic interpretation of the model is not lost. This enables to handle missing features in discriminatively optimized Bayesian networks. The potentials of the proposed method as well as a comparison to other existing work on maximum margin Bayesian networks is focus of this work.
NMF with l0-sparseness constraints
Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the l1-norm of the factor matrices, while little work has been done using a more natural sparseness measure, the l0-pseudo-norm. In the paper “Sparse nonnegative matrix factorization with l0-constraints”, we propose a framework for approximate NMF which constrains the l0-norm of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the alternating nonnegative least-squares scheme. This package contains Matlab implementations of our algorithms and experimental setups to reproduce our results.
On the Latent Variable Interpretation in Sum-Product Networks
This package reproduces the experiments in the paper
Robert Peharz, Robert Gens, Franz Pernkopf and Pedro Domingos,
“On the Latent Variable Interpretation in Sum-Product Networks”,
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI),
accepted for publication, 2016.
PARIS Simulation Framework
The PARIS Simulation Framework is a Matlab-based simulator developed by SPSC and NXP Semiconductors in 2007 through 2011. The development 2012 through 2013 (ongoing) was done at the Reynolds Lab at Duke University.
PTDB-TUG: Pitch Tracking Database from Graz University of Technology
The Pitch Tracking Database from Graz University of Technology (PTDB-TUG) is a speech database for pitch tracking that provides microphone and laryngograph signals of 20 English native speakers as well as the extracted pitch trajectories as a reference. The subjects had to read 2342 phonetically rich sentences from the existing TIMIT corpus. This text material is available spoken by both, female and male speakers. In total, this database consists of 4720 recorded sentences. All recordings were carried out on-site at the recording studio of the Institue of Broadband Communications at Graz University of Technology.
The Austrian German Multi-Sensor Corpus
AMISCO is a collection of multi-room and multi-channel close- and distant-talking Austrian German high-quality speech-recordings from 24 speakers, balanioced male and female. It contains around 8.2 hours of read speech, 53,000 word tokens based on 2,070 unique word types. This corpus features glottograms, fundamental frequencies, positions, and video recordings of speakers located at certain positions or walking along trajectories provided by the Kinects’ skeleton tracker.
- The TUG-EEC-Channels Database V. 1.1
Training Maximum-Likelihood Bayesian Network SVM
This Matlab package implements the algorithm proposed in Robert Peharz, Sebastian Tschiatschek and Franz Pernkopf, The Most Generative Maximum Margin Bayesian Networks, ICML 2013.
UWB Indoor Channel Experimental Data
The MeasureMINT (MINT stands for multipath-assisted indoor navigation and tracking) database contains position-resolved ultra-wideband channel measurements for several different indoor environments. These measurement campaigns were conducted to allow for an evaluation of the MINT indoor localization scheme, but may be of use for any research topic dealing with indoor radio propagation. Measurements were obtained either with a vector network analyzer, hence they are available as frequency domain channel transfer functions, or with an M-sequence radar device, hence time-domain signals are available directly.