Signal Processing and Speech Communication Laboratory
hometheses & projects › Real-Time Automatic Recognition of Spoken Digits on an Embedded System using Deep Recurrent Neural Networks

Real-Time Automatic Recognition of Spoken Digits on an Embedded System using Deep Recurrent Neural Networks

Status
Finished
Type
Master Thesis
Announcement date
01 Oct 2016
Student
Fridtjof Sterna
Mentors
Research Areas

Abstract

Automatic speech recognition has come a long way from systems capable of only recognizing isolated utterances of a single speaker to current systems with the capability to recognize spontaneous speech with very large vocabularies. Present-day automatic speech recognition (ASR) architectures are typically based on recurrent neural network (RNN) models, which have gained vast popularity over the past decade. Most commercial ASR systems rely on low-power front-end recording devices connected over the internet to high-performance back-ends that carry out the actual recognition task. With the advent of more powerful and low-cost embedded computing platforms, the possibility has arisen to implement a full ASR stack on an embedded device, consisting of the recording, pre-processing and classification of speech in noisy environments. In this thesis, a proof of concept (PoC) implementation of such a stack is presented, which encompasses a recording front-end, noise suppression by adaptive beamforming, feature extraction and classification of the audio. The classification is carried out by a sequence to sequence model, consisting of an RNN encoder and decoder. Experiments are performed to evaluate the efficiency of the devised system under different conditions and their results are presented and discussed.