Signal Processing and Speech Communication Laboratory
hometheses & projects › Pop Informed Speech Recognition

Pop Informed Speech Recognition

Status
Open
Type
Master Thesis
Announcement date
01 Jun 2016
Mentors
Research Areas

Short Description

When we look at a close-talking automatic speech recognition (ASR) situation, the speech output of the user can become a problem, when the distance to the microphone is too small. Pressure impulses from plosive speech sounds (e.g., /p/,/t/,/k/) can result in a distortion of the audio signal, which results in a decrease of speech recognition accuracy. Pressure gradient microphones are especially sensitive to pop sounds. To reduce the effects of this problem, we identify several points in the signal chain, where one can improve the situation. Other than a microphone foam cover and a high-pass filter, there is very little knowledge available that deals with the reduction of the impact of pop sounds. For small microphones, as they are used in mobile devices virtually no research results are available. A pop detection system for the speech recognizer can exclude distorted frames from the speech recognition path finder. Instead of feeding the speech recognizer with wrong data, it has been shown that the speech recognition accuracy rises, when explicitly marking the frames distorted by pop sounds as unusable. The path finding algorithm (e.g., Viterbi) can then exclude those frames [1].

Your Tasks

  • Literature survey for algorithms detecting pop sound distortions
  • Implementation and improvement of promising algoritms
  • Pop informed speech recognizer using pop detection output

Your Profile

  • Speech Communication 1+2
  • Good MATLAB skills
  • Unix-Shell, Kaldi is a plus, but not necessary

Contact

Martin Hagmüller (hagmueller@tugraz.at or 0316/873-4377)

References

[1] Cooke, M., et al., ‘Robust automatic speech recognition with missing and unreliable acoustic data’, Speech Communication, 34, pp. 267–285, 2001.