Speech communication is an experimental science where practical experience with the natural speech production and perception mechanism as well as with technical speech processing systems, their perceptual quality and their usability, are key components in the education of spoken language engineers. This laboratory introduces measurement methods and tools for speech transmission and speech perception (e.g., intelligibility vs. naturalness, mean-opinion-score testing), for the assessment of speech coders and speech recognizers in clean and disturbed environments. Finally, it addresses the design and configuration of large-scale dialogue systems through a prototypical application development task.

The course consists of 6 units of 4 hours each:
  • Lab 1: Speech signal analysis in the time, frequency, and time-frequency domains
  • Lab 2: Speech synthesis by time-domain concatenation and prosody modification
  • Lab 3: Speech coding
  • Lab 4: Hidden Markov Models
  • Lab 5: Speech recognition using Kaldi, Part I
  • Lab 6: Speech recognition using Kaldi, Part II

How should you prepare?

For each session, you need to prepare by studying the online course material listed below (see References & Handout Papers). Feel free to consult any of the course instructors.


During each lab session, your performance will be constantly monitored by asking questions about the experiments your are running, discussing the results, and by evaluating the written lab report you have to produce on the fly and to present in its final version at the end of each session. This documentation should emphasize two aspects:

  • Reproducability of your experiments. Note in detail which speech material, hardware and software components you work with, which algorithms and parameters you choose, etc. Add links to any speech data files you have created.
  • Interpretation of your results. Describe in explicit words what conclusions you draw from the results you have documented as tables or figures.

Because of the hands-on focus of the laboratory, a positive grade is only possible if you don't miss more than one session (i.e. 4 hours).


As a general introduction to voice communication technology, we recommend the quick tour.

For the description of speech signal properties and several signal analysis techniques, see the online course notes Akustische Phonetik und Sonagramm Lesen from the University of Munich.

For speech synthesis, you prepare best reading this (login required).

For speech coding, an introduction has been put together by students in our Advanced Signal Processing seminar; other overviews are available from the University of Southampton and Cambridge.

For an introduction to speech recognition consult Gale & Young (2007), as well as the lecture notes for Speech Communication 2.

