Cross-layer language models for conversational speech (FWF Stand-Alone Project P 32700-N)
- Period
- 2019 — 2023
- Funding
- Fonds zur Förderung der wissenschaftlichen Forschung, FWF (Österreich)
- Partners
- Dina El Zarka (Department of Linguistics, University of Graz)
- Bernhard Geiger (Know Center Gmbh)
- Roman Kern (Know Center Gmbh)
- Bogdan Ludusan (Bielefeld University)
- Benno Stein (Weimar University)
- Dimitra Vergyri (SRI International)
- Margaret Zellers (Kiel University)
- Research Areas
- Contact
- Members
In the last decade, conversational speech has received a lot of attention among speech scientists. On the one hand, accurate automatic speech recognition (ASR) systems are essential for conversational dialogue systems, as these become more interactional and social rather than solely transactional. On the other hand, linguists study natural conversations, as they reveal additional insights to controlled experiments with respect to how speech processing works. Investigating conversational speech, however, does not only require applying existing methods to new data, but developing new categories, new modeling techniques and including new knowledge sources.
Hypotheses, research questions and objectives
The three objectives of the project are
- to improve ASR systems for conversational speech,
- to increase our knowledge about the production and perception of conversational speech, and
- to increase our knowledge and resources for conversational Austrian German (see GRASS).
Approach and methods
On the basis of conversational speech and chat corpora from German and Austrian speakers, we will develop language models which include acoustic and semantic contextual information. These models will be informed by quantitative phonetic corpus studies and tested in ASR and speech perception experiments. For conducting the phonetic corpus studies and the perception experiments, speech technology will be used for creating automatic annotations, acoustic feature extraction and data analysis. Gained linguistic knowledge will then again be incorporated into the language models. This approach requires an interdisciplinary team that works closely together.
Level of originality and innovation
Whereas traditional language models are trained on text only, we aim at adding acoustic information. More specifically, we propose language models that incorporate information on the phonetic variation of the words (i.e., pronunciation variation and prosody) and relate this information to the semantic context of the conversation and to the communicative functions in the conversation. This approach to language modeling is in line with the theoretical model proposed by Hawkins and Smith (2001), where the perceptual system accesses meaning from speech by using the most salient sensory information from any combination of levels/layers of formal linguistic analysis. We thus speak of cross-layer models.
Related publications
- Conference paper Mihajlik P., Meng Y., S. M., Linke J., Schuppler B. & Mady K. (2024) On Disfluency and Non-lexical Sound Labeling for End-to-end Automatic Speech Recognition. in 25th Annual Conference of the International Speech Communication Association (pp. 1270-1274). [more info] [doi]
- Abstract Schuppler B., Kelterer A. & Hagmüller M. (2023) 10 Years of GRASS development: Experiences from annotating a large corpus of conversational Austrian German.. [more info]
- Conference paper Kerle L., Pucher M. & Schuppler B. (2023) Speaker interpolation based data augmentation for automatic speech recognition. in 20th International Congress on Phonetic Sciences (pp. 3126-3130). [more info]
- Conference paper Kelterer A., Zellers M. & Schuppler B. (2023) (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-prosodic Features. in 24th Annual Conference of the International Speech Communication Association (pp. 4768-4772). [more info] [doi]
- Conference paper Geiger B. & Schuppler B. (2023) Exploring Graph Theory Methods for the Analysis of Pronunciation Variation in Spontaneous Speech. in 24th Annual Conference of the International Speech Communication Association (pp. 596-600). [more info] [doi]
- Poster Linke J., Kadar M., Dosinszky G., Mihajlik P., Kubin G. & Schuppler B. (2023) What do self-supervised speech representations encode? An analysis of languages, varieties, speaking styles and speakers.. [more info] [doi]
- Preprint Linke J., Wepner S., Kubin G. & Schuppler B. (2023) Using Kaldi for Automatic Speech Recognition of Conversational Austrian German.. [more info] [doi]
- Review article Gabler P., Geiger B., Schuppler B. & Kern R. (2023) Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition.. [more info] [doi]
- Conference paper Linke J., Kubin G. & Schuppler B. (2023) Using word-level features for prosodic prominence detection in conversational speech. in 20th International Congress on Phonetic Sciences (pp. 3101). [more info]
- Conference paper Kelterer A., Wepner S., Linke J. & Schuppler B. (2023) Points of maximum grammatical control – The prosody of a turn-holding practice. in 20th International Congress on Phonetic Sciences (pp. 3467). [more info]
- Conference paper Paierl M., Röck T., Wepner S., Kelterer A. & Schuppler B. (2023) Creapy: A Python-based tool for the detection of creak in conversational speech. in 20th International Congress on Phonetic Sciences (pp. 1716). [more info]
- Conference paper Ludusan B. & Schuppler B. (2022) To laugh or not to laugh? The use of laughter to mark discourse structure. in 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 76–82). [more info]
- Conference paper Kelterer A., Wepner S., Christian S., Schuppler B. & Zarka D. (2022) Prosodic cues to agreement and disagreement prefaces in Austrian German conversations. in 1st International Conference on Tone and Intonation (pp. 107-111). [more info] [doi]
- Journal article Ludusan B. & Schuppler B. (2022) An analysis of prosodic boundaries across speaking styles in two varieties of German. in Speech Communication, 141, p. 93-106. [more info]
- Conference paper Wepner S., Schuppler B. & Kubin G. (2022) How prosody affects ASR performance in conversational Austrian German. in Speech Prosody 2022 (pp. 195-199). [more info] [doi]
- Abstract Kelterer A., Christian S., Wepner S., Linke J. & Zarka D. (2021) Prosodic cues to agreement and disagreement in "ja" and "nein" prefaces in Austrian German conversations.. [more info]
- Abstract Wepner S. (2021) Adaptation of Automatic Speech Recognition systems to the needs of Austrian German.. [more info]
- Conference paper Schuppler B. & Kelterer A. (2021) Developing an Annotation System for Communicative Functions for a Cross-Layer ASR System. in DiscAnn 2021: Integrating Perspectives on Discourse Annotation (pp. 14-18). [more info]
- Conference paper Linke J., Kelterer A., Dabrowski M., Zarka D. & Schuppler B. (2020) Towards automatic annotation of prosodic prominence levels in Austrian German. in 10th International Conference on Speech Prosody (pp. 1000 - 1004). [more info] [doi]