Databases and Tools

home › databases & tools

Artificial Bandwidth-Extension with Sum-Product Networks
This package performs training and inference for ABE, using an HMM/SPN-model as described in [1]. It is based on Matlab, but calls C++ binaries for training and inference of SPNs. The core routine for training is the Poon & Domingos algorithm for training SPNs [2], which was ported to C++ here. By downloading and using this package you agree to the terms in the file LICENSE.txt. [1] R. Peharz, G. Kapeller, P. Mowlaee and F. Fernkopf, “Modeling Speech with Sum-Product Networks: Application to Bandwidth Extension”, ICASSP, 2014. [2] H. Poon and P. Domingos, “Sum-product networks: A new deep architecture”, UAI,...
ATCOSIM: Air Traffic Control Simulation Speech Corpus
The ATCOSIM Air Traffic Control Simulation Speech corpus is a speech database of air traffic control (ATC) operator speech, provided by Graz University of Technology (TUG) and Eurocontrol Experimental Centre (EEC). It consists of ten hours of speech data, which were recorded during ATC real-time simulations using a close-talk headset microphone. The utterances are in English language and pronounced by ten non-native speakers. The database includes orthographic transcriptions and additional information on speakers and recording sessions. It was recorded an annotated by Konrad Hofbauer. Getting Started A brief introduction about the ATCOSIM corpus can be found in the LREC 2008...
Austrian SpeechDat
This is a telephone speech database for Austrian German. The databases contain one thousand calls each, from the fixed and mobile telephone network. Speakers were chosen to assure a representative distribution over accent regions, sex, and age groups. The database is compliant with the guidelines of the Speechdat project. The SpeechDat(AT) FixedDB-1000 database contains the recordings of 1,000 Austrian speakers (544 males, 456 females) recorded over the Austrian fixed telephone network. The following age distribution has been obtained: 15 speakers are under 16, 444 are between 16 and 30, 328 are between 31 and 45, 184 are between 46 and...
bibget
The MATLAB function bibget simplifies the creation of BibTeX databases from IEEE Xplore by providing a simple command-line interface. You can download bibget from the MATLAB Central File Exchange. To use bibget, you require an API key for the IEEE Xplore Metadata API. You can get such an API key from http://developer.ieee.org. After inserting your activated API key in the file bibkey.m, run the script bibdemo to get started!
ELHE: Austrian German Parallel Electro-Larynx -- Healthy Speech Corpus
We present the first parallel electro-larynx – healthy speech corpus for Austrian German: 7 speakers , male and female, different social and regional backgrounds read speech 6030 utterances, 19 510 words
Error Power Ratio
This page contains material related to our paper on the error power ratio (EPR) [1]. [1] “A Noise Power Ratio Measurement Method for Accurate Estimation of the Error Vector Magnitude”, K. Freiberger, H. Enzinger, and C. Vogel, submitted to IEEE Transcations on Microwave Theory and Techniques, Sep. 2016
Example-based automatic phonetic transcription
Phonetic transcriptions are an important resource in different research areas such as speech recognition or linguistics. Establishing phonetic transcriptions by hand is an exhausting process therefore it seems reasonable to develop an application that automatically creates phonetic transcriptions for given audio data. Current state-of-the-art systems for automatic phonetic transcription (APT) are mostly phone recognizers based on Hidden Markov models (HMMs). We present a different approach for APT especially designed for transcription with a large inventory of phonetic symbols. In contrast to most systems which are model-based, our approach is non-parametric using techniques derived from concatenative speech synthesis and template-based speech...
Fast Time-Domain Volterra Filtering Implemented in C
This is the source code produced for the paper “Fast Time-Domain Volterra Filtering”, presented at the Asilomar Conference on Signals, Systems and Computers, 2016. It includes 16 implementations of time-domain Volterra filters together with testbenches for the verification of correctness and runtime. The 16 implementations result from the combination of 4 methods of traversing (nested-loop, combinatoric, lookup-table, hard-coded) with 4 methods of computation (direct1, direct2, reuse, horner).
GRASS: the Graz corpus of Read And Spontaneous Speech
We present the first large scale speech database for Austrian German: 38 speakers, male and female, different social and regional backgrounds read speech 2744 utterances, 19510 words read and elicited commands 1710 utterances, 3853 words spontaneous conversations 48960 utterances, 276000 words GRASS is designed for linguistic & phonetic studies and for the development of an ASR system: high-quality super-wideband recordings simulation of different acoustic environments detailed orthographic transcriptions further (semi-)automatic annotation layers sufficient read speech and commands for ASR and dialogue system sufficient spontaneous speech pronunciation modeling for ASR Corpus Availability GRASS is available for free for Universities and Research...
Joint Linearity Efficiency Model
This is an implementation of the model presented in the paper “A Joint Linearity-Efficiency Model of Radio Frequency Power Amplifiers”. The MATLAB code can be downloaded from the MATLAB Central File Exchange.
Large Margin Learning of Gaussian Mixture Models
In our ECML 2010 paper we present a discriminative learning framework for Gaussian mixture models (GMMs) used for classification based on the extended Baum-Welch (EBW) algorithm. We suggest two criteria for discriminative optimization, namely the class conditional likelihood (CL) and the maximization of the margin (MM). In the experiments, we present results for synthetic data, broad phonetic classification, and a remote sensing application. The experiments show that CL-optimized GMMs (CL-GMMs) achieve a lower performance compared to MM-optimized GMMs (MM-GMMs), whereas both discriminative GMMs (DGMMs) perform significantly better than generatively learned GMMs. We also show that the generative discriminatively parameterized GMM...
Maximum Margin Bayesian Network Classifiers
Classification is an important task in machine learning. It deals with assigning a given object to one of a number of different categories. We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient method for optimization to solve this task. In contrast to previous approaches, we maintain the normalization constraints of the parameters of the Bayesian network during optimization, i.e. the probabilistic interpretation of the model is not lost. This enables to handle missing features in discriminatively optimized Bayesian networks. The potentials of the proposed method as well as a comparison to other existing...
NMF with l0-sparseness constraints
Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the l1-norm of the factor matrices, while little work has been done using a more natural sparseness measure, the l0-pseudo-norm. In the paper “Sparse nonnegative matrix factorization with l0-constraints”, we propose a framework for approximate NMF which constrains the l0-norm of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the...
On the Latent Variable Interpretation in Sum-Product Networks
This package reproduces the experiments in the paper Robert Peharz, Robert Gens, Franz Pernkopf and Pedro Domingos, “On the Latent Variable Interpretation in Sum-Product Networks”, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), accepted for publication, 2016. To get started, unzip everything into some folder and consult README.txt. Please note the attached LICENCE file.
PARIS Simulation Framework
The PARIS Simulation Framework is a Matlab-based simulator developed by SPSC and NXP Semiconductors in 2007 through 2011. The development 2012 through 2013 (ongoing) was done at the Reynolds Lab at Duke University. The Framework is designed for research on wideband addons to UHF RFID, such as, for example, ranging and localization. In contrast to other UHF RFID Simulators, it is specifically designed to handle (ultra)wideband signals, fading channels, as well as nonlinearities and detuning of tags. Quick Links Description Examples SVN Access Quickstart Guide »> Known Issues ««br> Description The framework consists of behavioral models of a UHF RFID...
PTDB-TUG: Pitch Tracking Database from Graz University of Technology
The Pitch Tracking Database from Graz University of Technology (PTDB-TUG) is a speech database for pitch tracking that provides microphone and laryngograph signals of 20 English native speakers as well as the extracted pitch trajectories as a reference. The subjects had to read 2342 phonetically rich sentences from the existing TIMIT corpus. This text material is available spoken by both, female and male speakers. In total, this database consists of 4720 recorded sentences. All recordings were carried out on-site at the recording studio of the Institue of Broadband Communications at Graz University of Technology. More details can be found in...
Graz Study Fair Corpus
We present the first speech database for Austrian German recorded to simulate a study fair: 20 speakers, male and female, different social and regional backgrounds Recorded in an environment that simulates an acoustically optimized and a standard setup for poster presentation GSFC is designed for linguistic & phonetic studies and for the development of an Distant ASR system: Corpus Availability GSFC will be available for free for Universities and Research Institutes as well as tools for automatic segmentation. Credits Study Fair team: Barbara Schuppler Martin Hagmüller Jamilla Balint Milena Stavric Pictures: Milena Stavric
The Austrian German Multi-Sensor Corpus
AMISCO is a collection of multi-room and multi-channel close- and distant-talking Austrian German high-quality speech-recordings from 24 speakers, balanioced male and female. It contains around 8.2 hours of read speech, 53,000 word tokens based on 2,070 unique word types. This corpus features glottograms, fundamental frequencies, positions, and video recordings of speakers located at certain positions or walking along trajectories provided by the Kinects’ skeleton tracker. Breaking News The whole corpus will be available from summer/autumn, 2017. Right now we are updating the corpus’s meta data, which will be soon available as a Git repository. Downloadable Preview A preview is already...
The TUG-EEC-Channels Database V. 1.1
Introduction: The _TUG-EEC-Channels_database consists of a collection of recordings of voice radio transmissions, which were generated during flights with a general aviation aircraft. Maximum length sequences (MLS) were transmitted over the voice channel of an amplitude modulation (AM) aeronautical VHF radio and the received signals were recorded. The measurements cover a wide range of typical flight situations as well as static back-to-back calibrations. Measurement System and Experiments: For detailed information about the applied measurement system and the conducted measurements please consult: K. Hofbauer, H. Hering, and G. Kubin, “A measurement system and the TUG-EEC-Channels database for the aeronautical voice radio,...
Training Maximum-Likelihood Bayesian Network SVM
This Matlab package implements the algorithm proposed in Robert Peharz, Sebastian Tschiatschek and Franz Pernkopf, The Most Generative Maximum Margin Bayesian Networks, ICML 2013.
UWB Indoor Channel Experimental Data
The MeasureMINT (MINT stands for multipath-assisted indoor navigation and tracking) database contains position-resolved ultra-wideband channel measurements for several different indoor environments. These measurement campaigns were conducted to allow for an evaluation of the MINT indoor localization scheme, but may be of use for any research topic dealing with indoor radio propagation. Measurements were obtained either with a vector network analyzer, hence they are available as frequency domain channel transfer functions, or with an M-sequence radar device, hence time-domain signals are available directly. Measurement Documentation A documentation of all the measurement campaigns and scenarios is available here. Frequency Domain Measurements (Vector...
wav2scape: An easy to use tool for analyzing speech data based on self-supervised representations
Overview wav2scape is a comprehensive tool for analyzing acoustic similarities and distances between speech categories using state-of-the-art self-supervised speech representations. Built on the wav2vec 2.0 framework [1] and utilizing the multilingual XLSR-53 model trained on 56,000 hours of speech data [2], wav2scape enables researchers to explore natural groupings and patterns in speech data across multiple dimensions. Its methodology is directly informed by recent research from our lab [3]. Main Purpose and Applications The primary purpose of wav2scape is to process audio recordings and generate similarity matrices based on the frequency usage of shared discrete speech representations. The tool is highly...