Signal Processing and Speech Communication Laboratory

GRASS - Orthographic Transcription

This article is part of GRASS: the Graz corpus of Read And Spontaneous Speech.

Transcription Protocol

  • PRAAT, separate tiers, short chunks of max. 6s
  • hesitations, repetitions and disfluencies
  • laughter, breathing, smacking, singing, etc.
  • foreign, proper and dialect words
  • overlapping talk

Here you can find a complete set of symbols used for the creation of the orthographic transcriptions.

Transcription Procedure

  1. 6 transcribers participated training workshop
  2. Then, they transcribed one conversation
  3. Second workshop: mutual correction of transcription
  4. Transcription of other conversations
  5. Correction by 1 transcriber other than who made the first transcription

During the whole transcription process, the transcribers continued to add content to a transcription protocol and to a lexicon (for the spelling of non-standard words, particles and non-lexical items), which they (online) shared amongst them.

Further Reading on GRASS