Pop Informed Speech Recognition
- Master/Diploma Thesis
- Seminar Type
- - None -
- Announcement date
- 01 Jun 2016
- Research Areas
When we look at a close-talking automatic speech recognition (ASR) situation, the speech output of the user can become a problem, when the distance to the microphone is too small. Pressure impulses from plosive speech sounds (e.g., /p/,/t/,/k/) can result in a distortion of the audio signal, which results in a decrease of speech recognition accuracy. Pressure gradient microphones are especially sensitive to pop sounds. To reduce the effects of this problem, we identify several points in the signal chain, where one can improve the situation. Other than a microphone foam cover and a high-pass filter, there is very little knowledge available that deals with the reduction of the impact of pop sounds. For small microphones, as they are used in mobile devices virtually no research results are available. A pop detection system for the speech recognizer can exclude distorted frames from the speech recognition path finder. Instead of feeding the speech recognizer with wrong data, it has been shown that the speech recognition accuracy rises, when explicitly marking the frames distorted by pop sounds as unusable. The path finding algorithm (e.g., Viterbi) can then exclude those frames .
- Literature survey for algorithms detecting pop sound distortions
- Implementation and improvement of promising algoritms
- Pop informed speech recognizer using pop detection output
- Speech Communication 1+2
- Good MATLAB skills
- Unix-Shell, Kaldi is a plus, but not necessary
Martin Hagmüller (firstname.lastname@example.org or 0316/873-4377)
 Cooke, M., et al., ‘Robust automatic speech recognition with missing and unreliable acoustic data’, Speech Communication, 34, pp. 267–285, 2001.