Signal Processing and Speech Communication Laboratory

GRASS - Symbols for Orthographic Transcriptions

This article is part of GRASS: the Graz corpus of Read And Spontaneous Speech.

Symbols used for the orthographic transcriptions and their assigned lexica:

ADABA
Lexicon of Austrian German
ERG
Lexicon with additional German words
DIAL
Lexicon with dialect words
PART
List of small particles
FSP
foreign words
MWEX
multi-word expressions
Lexical Item Example Lexicon
Standard Austrian German words ich gehe von zu Hause weg ERG
Dialect words  <  * DIAL > Kretzn DIAL
High frequent multi-word expressions ja geh bitte MWEX
Spelling of letters $G $K $K
Abbreviations, letters not spoken separately UNI ERG
Proper names of people, places, etc. Sankt Michael ERG
Numbers not written with digits #einhundertdreizehn ERG
Neologisms, invented by the speaker Genussvermeider ERG
Foreign words   FSP
Hesitations and disfluencies Example Lexicon
Repetition: word (group) produced more than once und dann hat \+ hat \+ er  
  + \und dann \+ + \und dann \+ hat er  
Slip of the tongue kervehrt\v PART
Broken word gebra\ PART
Other types of speech and non-speech Example  
Imitation of accent or other person und\i was\i hast\i du\i  
Imitation of an animal, vehicle, etc. tschu \L tschu \L PART
Whispering of an utterance er hat eh \F schon \F wissen \F  
Non-speech produced by the speakers’ vocal folds <laughter>, <singing>  
  <sigh>, <cough>, <smack>  
  <breathingIN>, <breathingOUT>  
Non-speech noise while producing a word <laughter>und <laughter>dann hat er  
Non-speech other than mentioned above <noise>  
Overlapping speech of two speakers \\ja, hm, ja das \\  
  \\<laughter>\\  
Artifacts in the recordings <# artefact>  
Other noises not covered with mentioned symbols <# noise>  

Further Reading on GRASS