GRASS: The Graz Corpus of Read and Spontaneous Speech

Publication TypeConference Paper
Year of Publication2014
AuthorsSchuppler, B., Hagmüller M., Morales-Cordovilla J. A., & Pessentheiner H.
Conference Name Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
PublisherEuropean Language Resources Association (ELRA)
Conference LocationReykjavik, Iceland
ISBN Number978-2-9517408-8-4

This paper provides a description of the preparation, the speakers, the recordings, and the creation of the orthographic transcriptions of the first large scale speech database for Austrian German. It contains approximately 1900 minutes of (read and spontaneous) speech produced by 38 speakers. The corpus consists of three components. First, the Conversation Speech (CS) component contains free conversations of one hour length between friends, colleagues, couples, or family members. Second, the Commands Component (CC) contains commands and keywords which were either read or elicited by pictures. Third, the Read Speech (RS) component contains phonetically balanced sentences and digits. The speech of all components has been recorded at super-wideband quality in a soundproof recording-studio with head-mounted microphones, large-diaphragm microphones, a laryngograph, and with a video camera. The orthographic transcriptions, which have been created and subsequently corrected manually, contain approximately 290 000 word tokens from 15 000 different word types.

Citation Key2825
