Speech Segmentation of Audio Books

home › theses & projects › Speech Segmentation of Audio Books

Speech Segmentation of Audio Books

Status

Finished

Type

Master Project

Announcement date

01 Oct 2011

Student

Philipp Salletmayr

Mentors

Harald Romsdorfer

Research Areas

Speech Communication

An important task in speech processing is the segmentation of speech utterances into the appropriate sequence of phones. This segmentation is traditionally accomplished using some kind of phoneme-based forced alignment algorithm. However, the segmentation of long speech utterances, so-called monologues, is in general a non-trivial issue, cf. [1].

Audio books offer a rich resource of high-quality speech material with accompanying text resources. Unfortunately, the speech material of audio books is a set of very long speech files. Recently, different approaches to the segmentation of monologues were accomplished, e.g. in [2].

This thesis aims to investigate an approach to phone segmentation of long speech monologues using, e.g., a combination of a grapheme-based forced alignment procedure for first sentence- and/or word-level segmentation, followed by a phone-based forced alignment procedure for the final phone level segmentation.

References:

[1] P. J. Moreno and C. Alberti: A factor automaton approach for the forced alignment of long speech recordings. In Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, pages 4869–4872, Taipei, Taiwan, 2009. 13, 14

[2] K. Prahallad: Automatic Building of Synthetic Voices from Audio Books. PhD Thesis, CMU, Pittsburgh, 2010.