GRASS: The Graz corpus of Read And Spontaneous Speech
- Sat, Mar 01, 2014
Both research in the field of linguistics and speech technology require the existence of large speech corpora, recorded at sufficiently high quality and transcribed at least at the orthographic level, which can be used for the generation of further annotation layers (e.g., phonetic, morphological, syntactic and/or prosodic level). Since for Austrian German the available speech material was very limited, we have recently created the GRASS corpus, the first corpus of read and conversational Austrian German. GRASS contains phonetically balanced sentences, commands elicited by pictures, key words, telephone numbers and one hour of free conversations produced by 38 speakers originating from one of the mayor cities of eastern Austria (Graz, Linz, Salzburg, Vienna). Super-wideband recordings enable the simulation of different acoustic environments by filtering the speech material with different measured room impulse responses. Orthographic transcriptions were created manually and include the annotation of breathing, hesitations and laughter. More information can be found in our paper.