Signal Processing and Speech Communication Laboratory
homedatabases & tools › GRASS: the Graz corpus of Read And Spontaneous Speech

GRASS: the Graz corpus of Read And Spontaneous Speech

Acronym
GRASS
Type
Database
Contact
Research Areas
Sampling frequency
48 kHz
Condition
Clean
Segmentation method
Manual
Segmentation level
Utterance
Language
de
Laryngograph
true
Number of speakers
38
Number of channels
5

We present the first large scale speech database for Austrian German:

  • 38 speakers , male and female, different social and regional backgrounds
  • read speech
    2 744 utterances, 19 510 words
  • read and elicited commands
    1 710 utterances, 3 853 words
  • spontaneous conversations
    48 960 utterances, 276 000 words

GRASS is designed for linguistic & phonetic studies and for the development of an ASR System:

  • high-quality super-wideband recordings

    simulation of different acoustic environments

  • detailed orthographic transcriptions

    further (semi-)automatic annotation layers

  • sufficient read speech and commands

    for ASR and dialogue system

  • sufficient spontaneous speech

    pronunciation modeling for ASR