GRASS: the Graz corpus of Read And Spontaneous Speech
- Acronym
- GRASS
- Type
- Database
- Contact
- Research Areas
- Sampling frequency
- 48 kHz
- Condition
- Clean
- Segmentation method
- Manual
- Segmentation level
- Utterance
- Language
- de
- Laryngograph
- true
- Number of speakers
- 38
- Number of channels
- 5
We present the first large scale speech database for Austrian German:
- 38 speakers, male and female, different social and regional backgrounds
- read speech
- 2744 utterances, 19510 words
- read and elicited commands
- 1710 utterances, 3853 words
- spontaneous conversations
- 48960 utterances, 276000 words
GRASS is designed for linguistic & phonetic studies and for the development of an ASR system:
- high-quality super-wideband recordings
- simulation of different acoustic environments
- detailed orthographic transcriptions
- further (semi-)automatic annotation layers
- sufficient read speech and commands
- for ASR and dialogue system
- sufficient spontaneous speech
- pronunciation modeling for ASR
Corpus Availability
GRASS is available for free for Universities and Research Institutes (from September 2014 onwards), as well as tools for automatic segmentation.
Credits
GRASS team:
- Barbara Schuppler
- Martin Hagmüller
- Juan A. Morales-Cordovilla
- Hannes Pessentheiner
Support:
- Pictures: Andreas Läßer
- Recording Studio Assistance: Ludwig Mohr
Funding by:
The work of Barbara Schuppler was funded by a Hertha-Firnberg grant (T572-N23) from the Austrian Science Fund (FWF). The work of the other authors was partly funded by the European project DIRHA (FP7-ICT-2011-7-288121) and the K-Project ASD, which is funded in the context of COMET Competence Centers for Excellent Technologies by BMVIT, BMWFJ, Styrian Business Promotion Agency (SFG), the Province of Styria - Government of Styria and The Technology Agency of the City of Vienna (ZIT). The programme COMET is conducted by Austrian Research Promotion Agency (FFG).