Speech Communication Laboratory
Speech communication is an experimental science where practical experience with the natural speech production and perception mechanism as well as with technical speech processing systems, their perceptual quality and their usability, are key components in the education of spoken language engineers. This laboratory introduces measurement methods and tools for speech transmission and speech perception (e.g., intelligibility vs. naturalness, mean-opinion-score testing), for the assessment of speech coders and speech recognizers in clean and disturbed environments. Finally, it addresses the design and configuration of large-scale dialogue systems through a prototypical application development task.
The course consists of 6 units of 4 hours each:
- Lab 1: Speech signal analysis in the time, frequency, and time-frequency domains
- Lab 2: Speech synthesis by time-domain concatenation and prosody modification
- Lab 3: Speech coding
- Lab 4: Hidden Markov Models
- Lab 5: Speech recognition using Kaldi, Part I
- Lab 6: Speech recognition using Kaldi, Part II
How should you prepare?
For each session, you need to prepare by studying the online course material listed below (see References & Handout Papers). Feel free to consult any of the course instructors.
During each lab session, your performance will be constantly monitored by asking questions about the experiments your are running, discussing the results, and by evaluating the written lab report you have to produce on the fly and to present in its final version at the end of each session. This documentation should emphasize two aspects:
- Reproducability of your experiments. Note in detail which speech material, hardware and software components you work with, which algorithms and parameters you choose, etc. Add links to any speech data files you have created.
- Interpretation of your results. Describe in explicit words what conclusions you draw from the results you have documented as tables or figures.
Because of the hands-on focus of the laboratory, a positive grade is only possible if you don’t miss more than one session (i.e. 4 hours).
As a general introduction to voice communication technology, we recommend the quick tour.
For speech synthesis, you prepare best reading this (login required).
- Lab 1 – Speech Analysis:Handout paper. Additional files:analysis.zip.
- Lab 2 – Speech Synthesis:Handout paper. m-files lpc_synth_2018.zip. Demos of commercial text-to-speech systems can be tested under:CereProc,IBM,Acapela.
- Lab 3 – Speech Coding:Handout paper. Additional files:coding.zip. Updated encode_decode.m for Linux. For further investigation on intelligibility evaluation and spectrogram plotting please may the following SpeechCommLab.rar
- Lab 4 – Hidden Markov Models:Handout paper. Additional material:Hidden Markov Model Tutorial + Matlab,Mixtures of Gaussians + Matlab
- Lab 5 & 6 – Automatic Speech Recognition:tutorial