Pop Informed Speech Recognition

home › theses & projects › Pop Informed Speech Recognition

Pop Informed Speech Recognition

Status

Open

Type

Master Thesis

Announcement date

01 Jun 2016

Mentors

Martin Hagmüller

Research Areas

Speech Communication

Short Description

When we look at a close-talking automatic speech recognition (ASR) situation, the speech output of the user can become a problem, when the distance to the microphone is too small. Pressure impulses from plosive speech sounds (e.g., /p/,/t/,/k/) can result in a distortion of the audio signal, which results in a decrease of speech recognition accuracy. Pressure gradient microphones are especially sensitive to pop sounds. To reduce the effects of this problem, we identify several points in the signal chain, where one can improve the situation. Other than a microphone foam cover and a high-pass filter, there is very little knowledge available that deals with the reduction of the impact of pop sounds. For small microphones, as they are used in mobile devices virtually no research results are available. A pop detection system for the speech recognizer can exclude distorted frames from the speech recognition path finder. Instead of feeding the speech recognizer with wrong data, it has been shown that the speech recognition accuracy rises, when explicitly marking the frames distorted by pop sounds as unusable. The path finding algorithm (e.g., Viterbi) can then exclude those frames [1].

Your Tasks

Literature survey for algorithms detecting pop sound distortions
Implementation and improvement of promising algoritms
Pop informed speech recognizer using pop detection output

Your Profile

Speech Communication 1+2
Good MATLAB skills
Unix-Shell, Kaldi is a plus, but not necessary

Contact

Martin Hagmüller (hagmueller@tugraz.at or 0316/873-4377)

References

[1] Cooke, M., et al., ‘Robust automatic speech recognition with missing and unreliable acoustic data’, Speech Communication, 34, pp. 267–285, 2001.