Reliable Voice Activity Detection Under Adverse Environments

Project Type: Master/Diploma Thesis
Student: Stadtschnitzer Michael
Mentor: Gernot Kubin

 

 Voice activity detection (VAD) refers to the ability to distinguish between speech and non-speech in signals. This task is not as trivial as it seems to be due to the complexity of both speech and background noise. VAD plays an important role in a wide range of applications such as speech enhancement, speech coding and automatic speech recognition. In this thesis, besides the theory of VAD, a review of state-of-the-art and recently proposed VAD methods has been studied. Two robust VAD methods are developed within this work. The first VAD method is based on a single wavelet subband power distance feature and an adaptive percentile filter. The second VAD algorithm exploits the mel frequency cepstral coefficients as the inputs to an artificial neural net. Robustness of the invented VAD algorithms is evaluated and compared with standard VAD methods proposed in ITU-T Rec. G.729 Annex B and ETSI ES 202 050 Advanced Front End (VAD for noise estimation and frame dropping), and with a VAD algorithm based on order statistics filters. The results show that the developed methods are very robust to environmental noise and mostly outperform other VAD methods.