Signal Processing and Speech Communication Laboratory
homedatabases & tools › Austrian SpeechDat

Austrian SpeechDat

Acronym
SpeechDat(AT)
Type
Database
Contact
Relevant files
Research Areas
Sampling frequency
8 kHz
Condition
Clean
Segmentation level
Utterance
Language
de
Number of speakers
2000
Number of utterances
56000
Number of channels
1

This is a telephone speech database for Austrian German. The databases contain one thousand calls each, from the fixed and mobile telephone network. Speakers were chosen to assure a representative distribution over accent regions, sex, and age groups. The database is compliant with the guidelines of the Speechdat project.

The SpeechDat(AT) FixedDB-1000 database contains the recordings of 1,000 Austrian speakers (544 males, 456 females) recorded over the Austrian fixed telephone network.
The following age distribution has been obtained: 15 speakers are under 16, 444 are between 16 and 30, 328 are between 31 and 45, 184 are between 46 and 60, and 29 speakers are over 60.

The Austrian SpeechDat(AT) MobileDB-1000 database contains the recordings of 1,000 Austrian speakers (543 males, 457 females) recorded over the Austrian mobile telephone network.
The following age distribution has been obtained: 18 speakers are under 16, 550 are between 16 and 30, 262 are between 31 and 45, 157 are between 46 and 60, and 13 speakers are over 60.

Speech samples are stored as sequences of 8-bit 8 kHz A-law, uncompressed. Each prompted utterance is stored in a separate file, and each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Each speaker uttered the following items:

  • 3 isolated digits
  • 4 connected digits (prompt sheet number -5 digits, telephone number –9/11 digits, credit card number –15/16 digits, PIN code –6 digits)
  • 1 natural number
  • 2 money amounts (currency amount, mixed size and units)
  • 2 yes/no questions (predominantly “yes”, predominantly “no”)
  • 3 dates (spontaneous date e.g. birthday, prompted date, relative and general date expression)
  • 2 times (spontaneous time of day, prompted mixed/analogue digital)
  • 6 application words
  • 1 word spotting phrase using embedded application words
  • 7 directory assistance names (spontaneous names e.g. forenames, city of birth, a name out of a set of 150 SDB full names, most frequent cities, most frequent companies)
  • 3 spellings (spontaneous e.g. forename, directory city name, real/artificial city name)
  • 4 isolated words
  • 12 phonetically rich sentences
  • 7 speaker specific material (speaker gender question, call from fixed or mobile network, speaker region question, today’s date, environment of call, native language, educational level)

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.