Signal Processing and Speech Communication Laboratory
hometheses & projects › Segmental Conditional Random Fields for Phone Recognition

Segmental Conditional Random Fields for Phone Recognition

Status
Finished
Type
Master/Diploma Thesis
Announcement date
01 Oct 2015
Student
Christopher Walles
Mentors
Research Areas

Abstract

Automatic speech recognition (ASR) is a broad field of research. Applications of ASR include voice user interfaces, like those nowadays found in smart phones, auto- matic speech to text transcription or dialogue systems for people with impairments. An important sub-task of ASR is phone recognition. A phone recognition system detects the phones in a given speech signal. Speech data is segmental in nature, i.e each phone is represented by a variable number of input data frames. Usually, there are more input data frames than output labels. This thesis deals with segmental conditional random fields (SCRFs) to tackle the task of phone recognition. SCRFs are the segmental generalization of conditional random fields. The latter are the discriminative counterparts to hidden Markov models (HMMs). This thesis gives an overview of the formulas and algorithms required to apply the segmental conditional random field in practice. In particular we present an efficient dynamic programming algorithm that allows for training a SCRF on unsegmented data. In order to make the SCRF more powerful we equip it with a neural network style hidden layer. In the experiments, we apply segmental conditional random fields to the task of phone recognition and present results on the TIMIT database. We show that the algorithm to train a SCRF on unsegmented data achieves in practice the same results as when the model is trained on the manual segmented data from TIMIT. The presented model configuration of the hidden layer SCRF that was trained with backpropagation outperforms other published SCRF approaches and achieves a phone recognition accuracy of 75.07% on the TIMIT core test set.