Signal Processing and Speech Communication Laboratory

GRASS - Symbols for Orthographic Transcriptions

This article is part of GRASS: the Graz corpus of Read And Spontaneous Speech.

Symbols used for the orthographic transcriptions and their assigned lexica:

ADABA
Lexicon of Austrian German
ERG
Lexicon with additional German words
DIAL
Lexicon with dialect words
ForeignWords
foreign words
BrokenWords
broken words
SpellingAlphabet
spelled letters
MultiWordExpressions
multi-word expressions
Lexical Item Example Lexicon
Standard Austrian German words ich gehe von zu Hause weg ERG
Dialect words  <*DIAL>Kretzn DIAL
High frequent multi-word expressions with special pronunciation wenn_du MultiWordExpressions
Spelling of letters $G $K $K SpellingAlphabet
Proper names of people, places, etc. Sankt Michael ERG
Numbers not written with digits #einhundertdreizehn ERG
Neologisms, invented by the speaker Genussvermeider ERG
Foreign words   ForeignWords
Hesitations and disfluencies Example Lexicon
Repetition: word (group) produced more than once und dann (+ hat + hat +) er  
  (+ und dann + und dann +) hat er  
Slip of the tongue <&s>kervehrt
Misbuilt grammar du <&m>kriegt
Broken word <&b>gebra BrokenWords
Other types of speech and non-speech Example  
Imitation of accent or other person <&i>und <&i>was <&i>hast <&i>du  
Onomatopoeia <&o>tschu <&o>tschu
Whispered words er hat eh <&w>schon <&w>wissen  
Non-speech produced by the speakers’ vocal folds <laughter>, <singing>  
  <sigh>, <cough>, <smack>  
  <breathingIN>, <breathingOUT>  
Laughed words <&L>und <&L>dann <&L>hat <&L>er  
Non-speech other than mentioned above <noise>  
Overlapping speech of two speakers [ ja, hm, ja das ]  
Artifacts in the recordings <#artefact>  
Other noises not covered with mentioned symbols <#noise>  

Further Reading on GRASS