Cross-layer language models for conversational speech (FWF Stand-Alone Project P 32700-N)

home › research projects › Cross-layer language models for conversational speech (FWF Stand-Alone Project P 32700-N)

Cross-layer language models for conversational speech (FWF Stand-Alone Project P 32700-N)

Period

2019 — 2024

Funding

Fonds zur Förderung der wissenschaftlichen Forschung, FWF (Österreich)

Partners

Dina El Zarka (Department of Linguistics, University of Graz)
Roman Kern (Know Center Gmbh)
Bogdan Ludusan (Bielefeld University)
Benno Stein (Weimar University)
Dimitra Vergyri (SRI International)
Margaret Zellers (Kiel University)

Research Areas

Speech Communication

Contact

Barbara Schuppler

Members

In the last decade, conversational speech has received a lot of attention among speech scientists. On the one hand, accurate automatic speech recognition (ASR) systems are essential for conversational dialogue systems, as these become more interactional and social rather than solely transactional. On the other hand, linguists study natural conversations, as they reveal additional insights to controlled experiments with respect to how speech processing works. Investigating conversational speech, however, does not only require applying existing methods to new data, but developing new categories, new modeling techniques and including new knowledge sources.

Hypotheses, research questions and objectives

The three objectives of the project are

to improve ASR systems for conversational speech,
to increase our knowledge about the production and perception of conversational speech, and
to increase our knowledge and resources for conversational Austrian German (see GRASS).

Approach and methods

On the basis of conversational speech and chat corpora from German and Austrian speakers, we will develop language models which include acoustic and semantic contextual information. These models will be informed by quantitative phonetic corpus studies and tested in ASR and speech perception experiments. For conducting the phonetic corpus studies and the perception experiments, speech technology will be used for creating automatic annotations, acoustic feature extraction and data analysis. Gained linguistic knowledge will then again be incorporated into the language models. This approach requires an interdisciplinary team that works closely together.

Level of originality and innovation

Whereas traditional language models are trained on text only, we aim at adding acoustic information. More specifically, we propose language models that incorporate information on the phonetic variation of the words (i.e., pronunciation variation and prosody) and relate this information to the semantic context of the conversation and to the communicative functions in the conversation. This approach to language modeling is in line with the theoretical model proposed by Hawkins and Smith (2001), where the perceptual system accesses meaning from speech by using the most salient sensory information from any combination of levels/layers of formal linguistic analysis. We thus speak of cross-layer models.

Related publications

Preprint Linke J. & Schuppler B. (2025) Prominence-aware automatic speech recognition for conversational speech.. [more info] [doi]
Conference paper Wepner S. & Schuppler B. (2025) (When) Does it Harm to Be Incomplete? Encoding ASR Mistranscriptions of Syntactically Disfluent Structures. in 3rd Graz-Wien Speechworkshop (pp. 23 - 24). [more info]
Conference paper Pasqualini E., Schuppler B., Hagmüller M. & Pernkopf F. (2025) Speech Enhancement of Conversational Speech in Cocktail Party Noise. in 3rd Graz-Wien Speechworkshop (pp. 27 - 28). [more info]
Conference paper Schuppler B. (2025) Cross-fertilization between speech science and technology for the study of conversational speech. in Beszédkutatás Speech Research Conference (pp. 9 - 14). [more info]
Conference paper Paierl M., Hagmüller M. & Schuppler B. (2025) Continuous prediction of backchannel timing for human-robot interaction. in 26th Interspeech Conference 2025 (pp. 3020 - 3024). [more info]
Journal article Paierl M., Kelterer A. & Schuppler B. (2025) Distribution and Timing of Verbal Backchannels in Conversational Speech: A Quantitative Study. in Languages, 10(8). [more info] [doi]
Conference paper Wepner S., Eckert L., Kubin G. & Schuppler B. (2025) What the Filler? Both ASR Systems and Humans Struggle More With Other Kinds of Disfluencies Than With Filler Particles. in Interspeech 2025 (pp. 2325-2329). [more info]
Journal article Kelterer A. & Schuppler B. (2025) Turn-taking annotation for quantitative and qualitative analyses of conversation. in arXiv.org e-Print archive, cs.CL, p. 1. [more info] [doi]
Conference paper Eckert L., Wepner S. & Schuppler B. (2025) Slicer – A Tool for Efficient Stimuli Extraction from Large Speech Corpora. in 11th Convention of the European Acoustics Association, Euronoise 2025. [more info]
Conference paper Linke J., Steger S., Steinwender P., Kubin G., Pernkopf F. & Schuppler B. (2025) Uncertainty prediction for prominence classification with chroma features. in 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 (pp. 1 - 5). [more info] [doi]
Journal article Linke J., Geiger B., Kubin G. & Schuppler B. (2025) What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures. in Computer Speech and Language , 90. [more info] [doi]
Habilitation Schuppler B. (2024) Cross-layer models for conversational speech.. [more info] [doi]
Conference paper Karner M., Linke J., Kroell M., Schuppler B. & Geiger B. (2024) Towards Improving ASR Outputs of Spontaneous Speech with LLMs. in 20th Conference on Natural Language Processing, KONVENS 2024 (pp. 339-348). [more info]
Conference paper Dumitru V., Boehm M., Hagmüller M. & Schuppler B. (2024) Version Control for Speech Corpora. in 20th Conference on Natural Language Processing, KONVENS 2024 (pp. 303-308). [more info]
Conference paper Mihajlik P., Meng Y., S. M., Linke J., Schuppler B. & Mady K. (2024) On Disfluency and Non-lexical Sound Labeling for End-to-end Automatic Speech Recognition. in 25th Annual Conference of the International Speech Communication Association (pp. 1270-1274). [more info] [doi]
Journal article Zarka D., Kelterer A., Gubian M. & Schuppler B. (2024) The prosody of theme, rheme and focus in Egyptian Arabic. in Speech Communication, 160. [more info] [doi]
Editorial Schuppler B., Adda-Decker M., Cucchiarini C. & Muhr R. (2024) An introduction to pluricentric languages in speech science and technology.. [more info] [doi]
Abstract Schuppler B., Kelterer A. & Hagmüller M. (2023) 10 Years of GRASS development: Experiences from annotating a large corpus of conversational Austrian German.. [more info]
Conference paper Kerle L., Pucher M. & Schuppler B. (2023) Speaker interpolation based data augmentation for automatic speech recognition. in 20th International Congress on Phonetic Sciences (pp. 3126-3130). [more info]
Conference paper Kelterer A., Zellers M. & Schuppler B. (2023) (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-prosodic Features. in 24th Annual Conference of the International Speech Communication Association (pp. 4768-4772). [more info] [doi]
Conference paper Geiger B. & Schuppler B. (2023) Exploring Graph Theory Methods for the Analysis of Pronunciation Variation in Spontaneous Speech. in 24th Annual Conference of the International Speech Communication Association (pp. 596-600). [more info] [doi]
Conference paper Linke J., Kadar M., Dosinszky G., Mihajlik P., Kubin G. & Schuppler B. (2023) What do self-supervised speech representations encode? An analysis of languages, varieties, speaking styles and speakers. in 24th Annual Conference of the International Speech Communication Association (pp. 5371-5375). [more info] [doi]
Preprint Linke J., Wepner S., Kubin G. & Schuppler B. (2023) Using Kaldi for Automatic Speech Recognition of Conversational Austrian German.. [more info] [doi]
Review article Gabler P., Geiger B., Schuppler B. & Kern R. (2023) Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition.. [more info] [doi]
Conference paper Linke J., Kubin G. & Schuppler B. (2023) Using word-level features for prosodic prominence detection in conversational speech. in 20th International Congress on Phonetic Sciences (pp. 3101). [more info]
Conference paper Kelterer A., Wepner S., Linke J. & Schuppler B. (2023) Points of maximum grammatical control – The prosody of a turn-holding practice. in 20th International Congress on Phonetic Sciences (pp. 3467-3471). [more info]
Conference paper Paierl M., Röck T., Wepner S., Kelterer A. & Schuppler B. (2023) Creapy: A Python-based tool for the detection of creak in conversational speech. in 20th International Congress on Phonetic Sciences (pp. 1716). [more info]
Conference paper Ludusan B. & Schuppler B. (2022) To laugh or not to laugh? The use of laughter to mark discourse structure. in 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 76–82). [more info]
Conference paper Kelterer A., Wepner S., Christian S., Schuppler B. & Zarka D. (2022) Prosodic cues to agreement and disagreement prefaces in Austrian German conversations. in 1st International Conference on Tone and Intonation (pp. 107-111). [more info] [doi]
Journal article Ludusan B. & Schuppler B. (2022) An analysis of prosodic boundaries across speaking styles in two varieties of German. in Speech Communication, 141, p. 93-106. [more info]
Conference paper Wepner S., Schuppler B. & Kubin G. (2022) How prosody affects ASR performance in conversational Austrian German. in Speech Prosody 2022 (pp. 195-199). [more info] [doi]
Abstract Kelterer A., Christian S., Wepner S., Linke J. & Zarka D. (2021) Prosodic cues to agreement and disagreement in "ja" and "nein" prefaces in Austrian German conversations.. [more info]
Abstract Wepner S. (2021) Adaptation of Automatic Speech Recognition systems to the needs of Austrian German.. [more info]
Conference paper Schuppler B. & Kelterer A. (2021) Developing an Annotation System for Communicative Functions for a Cross-Layer ASR System. in DiscAnn 2021: Integrating Perspectives on Discourse Annotation (pp. 14-18). [more info]
Conference paper Linke J., Kelterer A., Dabrowski M., Zarka D. & Schuppler B. (2020) Towards automatic annotation of prosodic prominence levels in Austrian German. in 10th International Conference on Speech Prosody (pp. 1000 - 1004). [more info] [doi]