AUTOMATIC SPEECH RECOGNITION FOR DYSARTHRIC SPEAKERS

Project Type: Master/Diploma Thesis
Student: Rexeis Susanne

 

 Dysarthria is a speech impairment caused by neuro-muscular damages of various cause that can also lead to reduced dexterity or paralysis of other body parts, e.g. the limbs. For these patients the use of speech technology as interface to an environmental control system or to a computer can be a valuable assistance in everyday life. However, due to the various speaker- dependent disturbances typical for dysarthric speech the performance of standard automatic speech recognition (ASR)-systems is limited. This work investigates different approaches to improve the performance of speech recognizers for German-speaking males suffering from moderate to severe dysarthria. The speech data was recorded in cooperation with the Simon project. Evaluations on a small-vocabulary connected digits task showed that speaker-independent (SI) acoustic models adapted to dysarthric speech using maximum likelihood linear regression (MLLR) could achieve better results than speaker-dependent (SD) acoustic models for a patient suffering from severe dysarthria. For two out of the five dysarthric speakers word recognition rates of over 90% could be achieved using MLLR-adaption. On a task using a larger vocabulary of 69 command words, however, only a maximum recognition rate of 70% could be achieved using acoustic adaption. In the utterances of the dysarthric speakers mispronunciations of certain phonemes could be identified. Two data-driven approaches to adapt the pronunciation dictionaries of the recognition systems to dysarthric speech were proposed and evaluated: phonological rules and finite state transducer (FST) networks. The pronunciation errors of the speakers were modeled based on the evaluation of the speech recognizers on a rhyme-test. Lexical adaption with phonological rules achieved promising results on the rhyme-test evaluation. In contrast the improvement of the recognition rate in the command word task was barely measurable, as a high number of new confusions occurred after adaption. Two methods to prune the generated pronunciation variants based on their probability did not succeed to lower the number of confusions. Lexical adaption with FSTs failed to improve results on both the rhyme-test and the command word task. The number of new recognition errors after adaption was again very high, although a score to measure the confusability of the newly generated variants was used for pruning. The information extracted from the phone confusions of the rhyme-test seems to be too sparse to score the confusability of the new pronunciations correctly in this approach.