Speech Segmentation of Audio Books
- Status
- Finished
- Type
- Master Project
- Announcement date
- 01 Oct 2011
- Student
- Philipp Salletmayr
- Mentors
- Harald Romsdorfer
- Research Areas
An important task in speech processing is the segmentation of speech utterances into the appropriate sequence of phones. This segmentation is traditionally accomplished using some kind of phoneme-based forced alignment algorithm. However, the segmentation of long speech utterances, so-called monologues, is in general a non-trivial issue, cf. [1].
Audio books offer a rich resource of high-quality speech material with accompanying text resources. Unfortunately, the speech material of audio books is a set of very long speech files. Recently, different approaches to the segmentation of monologues were accomplished, e.g. in [2].
This thesis aims to investigate an approach to phone segmentation of long speech monologues using, e.g., a combination of a grapheme-based forced alignment procedure for first sentence- and/or word-level segmentation, followed by a phone-based forced alignment procedure for the final phone level segmentation.
References:
[1] P. J. Moreno and C. Alberti: A factor automaton approach for the forced alignment of long speech recordings. In Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, pages 4869–4872, Taipei, Taiwan, 2009. 13, 14
[2] K. Prahallad: Automatic Building of Synthetic Voices from Audio Books. PhD Thesis, CMU, Pittsburgh, 2010.