Semantic Similarity in Automatic Speech Recognition for Meetings

PhD Student 
Michael Pucher
Research Area

 

 This thesis investigates the application of language models based on semantic similarity to Automatic Speech Recognition for meetings. We consider data-driven Latent Semantic Analysis based and knowledge-driven WordNet-based models. Latent Semantic Analysis based models are trained for several background domains and it is shown that all background models reduce perplexity compared to the n-gram baseline models, and some background models also significantly improve speech recognition for meetings. A new method for interpolating multiple models is introduced and the relation to cache-based models is investigated. The semantics of the models is investigated through a synonymity task. WordNet-based models are defined for different word-word similarities that use information encoded in the WordNet graph and corpus information. It is shown that these models can significantly improve over baseline random models on the task of word prediction, and that the chosen part-of-speech context is essential for the performance of the models. No improvement over n-gram baseline models is achieved for the task of speech recognition for meetings.  

 

This thesis is supervised by Gernot Kubin.