Signal Processing and Speech Communication Laboratory
homeresearch projects › Cross-layer language models for conversational speech (FWF Stand-Alone Project P 32700-N)

Cross-layer language models for conversational speech (FWF Stand-Alone Project P 32700-N)

2019 — 2023
Fonds zur Förderung der wissenschaftlichen Forschung, FWF (Österreich)
  • Dina El Zarka (Department of Linguistics, University of Graz)
  • Bernhard Geiger (Know Center Gmbh)
  • Roman Kern (Know Center Gmbh)
  • Bogdan Ludusan (Bielefeld University)
  • Benno Stein (Weimar University)
  • Dimitra Vergyri (SRI International)
  • Margaret Zellers (Kiel University)
Research Areas

In the last decade, conversational speech has received a lot of attention among speech scientists. On the one hand, accurate automatic speech recognition (ASR) systems are essential for conversational dialogue systems, as these become more interactional and social rather than solely transactional. On the other hand, linguists study natural conversations, as they reveal additional insights to controlled experiments with respect to how speech processing works. Investigating conversational speech, however, does not only require applying existing methods to new data, but developing new categories, new modeling techniques and including new knowledge sources.

Hypotheses, research questions and objectives

The three objectives of the project are

  1. to improve ASR systems for conversational speech,
  2. to increase our knowledge about the production and perception of conversational speech, and
  3. to increase our knowledge and resources for conversational Austrian German (see GRASS).

Approach and methods

On the basis of conversational speech and chat corpora from German and Austrian speakers, we will develop language models which include acoustic and semantic contextual information. These models will be informed by quantitative phonetic corpus studies and tested in ASR and speech perception experiments. For conducting the phonetic corpus studies and the perception experiments, speech technology will be used for creating automatic annotations, acoustic feature extraction and data analysis. Gained linguistic knowledge will then again be incorporated into the language models. This approach requires an interdisciplinary team that works closely together.

Level of originality and innovation

Whereas traditional language models are trained on text only, we aim at adding acoustic information. More specifically, we propose language models that incorporate information on the phonetic variation of the words (i.e., pronunciation variation and prosody) and relate this information to the semantic context of the conversation and to the communicative functions in the conversation. This approach to language modeling is in line with the theoretical model proposed by Hawkins and Smith (2001), where the perceptual system accesses meaning from speech by using the most salient sensory information from any combination of levels/layers of formal linguistic analysis. We thus speak of cross-layer models.