Jointly Model Source Separation and Speech Recognition
- Status
- Open
- Type
- Master Thesis
- Announcement date
- 05 Oct 2015
- Mentors
- Research Areas
Short Description
Speech recognition under realistic conditions is still an unsolved problem after decades of research. But the smartphone market, for example, demands working solutions with small resource footprints. Usually, phone recognition and source separation models (e.g. separation of speech and noise signal) are trained independentlly and applied in sequence. This work should bring both aspects together either by phone-aware source separation or joint training of a source separation and phone recognition model. Recently, the source separation problem has been formulated as structure prediction problem (like classification but on sequences) [1] using Linear-chain Conditional Random Fields (LC-CRFs)[2]. LC-CRFs have been extended at our lab. Your task will be to use or to extend these models.
Your Tasks
- Preparation of a data set in Matlab
- Implement or extend these models in Java (there is an existing implementation)
- Analyze the implemented systems in terms of accuracy and computational performance
Your Profile
- Very good theoretical and mathmatical background (mandatory)
- Good knowledge in machine learning
- Very good knowledge and experience in Java programming (mandatory)
Additional Information
As this work combines theoretical and experimental aspects of non-standard methods, a very good mathmatical and programming background is mandatory. This thesis project is planned for a duration of 6 months starting immediately. It has a good chance for publications.
Contact
Martin Ratajczak (martin.ratajczak@tugraz.at or +43 (316) 873 - 4379)
References
[1] Y. Wang and D. Wang, “Cocktail party processing via structured prediction,” in Advances in Neural Information Processing Systems (NIPS), 2012, pp. 224–232.
[2] J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in International Conference on Machine Learning (ICML), 2001, pp. 282–289.