Real-Time Automatic Recognition of Spoken Digits on an Embedded System using Deep Recurrent Neural Networks

home › theses & projects › Real-Time Automatic Recognition of Spoken Digits on an Embedded System using Deep Recurrent Neural Networks

Real-Time Automatic Recognition of Spoken Digits on an Embedded System using Deep Recurrent Neural Networks

Status

Finished

Type

Master Thesis

Announcement date

01 Oct 2016

Student

Fridtjof Sterna

Mentors

Franz Pernkopf

Research Areas

Intelligent Systems

Abstract

Automatic speech recognition has come a long way from systems capable of only recognizing isolated utterances of a single speaker to current systems with the capability to recognize spontaneous speech with very large vocabularies. Present-day automatic speech recognition (ASR) architectures are typically based on recurrent neural network (RNN) models, which have gained vast popularity over the past decade. Most commercial ASR systems rely on low-power front-end recording devices connected over the internet to high-performance back-ends that carry out the actual recognition task. With the advent of more powerful and low-cost embedded computing platforms, the possibility has arisen to implement a full ASR stack on an embedded device, consisting of the recording, pre-processing and classification of speech in noisy environments. In this thesis, a proof of concept (PoC) implementation of such a stack is presented, which encompasses a recording front-end, noise suppression by adaptive beamforming, feature extraction and classification of the audio. The classification is carried out by a sequence to sequence model, consisting of an RNN encoder and decoder. Experiments are performed to evaluate the efficiency of the devised system under different conditions and their results are presented and discussed.