Jointly Model Source Separation and Speech Recognition

Project Type: Master/Diploma Thesis
Project Status: Open

Short Description  

Speech recognition under realistic conditions is still an unsolved problem after decades of research. But the smartphone market, for example, demands working solutions with small resource footprints. Usually, phone recognition and source separation models (e.g. separation of speech and noise signal) are trained independentlly and applied in sequence. This work should bring both aspects together either by phone-aware source separation or joint training of a source separation and phone recognition model. Recently, the source separation problem has been formulated as structure prediction problem (like classification but on sequences) [1] using Linear-chain Conditional Random Fields (LC-CRFs)[2]. LC-CRFs have been extended at our lab. Your task will be to use or to extend these models.

Your Tasks 

  • Preparation of a data set in Matlab 
  • Implement or extend these models in Java (there is an existing implementation) 
  • Analyze the implemented systems in terms of accuracy and computational performance

Your Profile 

  • Very good theoretical and mathmatical background (mandatory) 
  • Good knowledge in machine learning 
  • Very good knowledge and experience in Java programming (mandatory)

Additional Information

As this work combines theoretical and experimental aspects of non-standard methods, a very good mathmatical and programming background is mandatory. This thesis project is planned for a duration of 6 months starting immediately. It has a good chance for publications.

Contact

Martin Ratajczak (martin.ratajczak@tugraz.at or +43 (316) 873 - 4379)

References

[1] Y. Wang and D. Wang, “Cocktail party processing via structured prediction,” in Advances in Neural Information Processing Systems (NIPS), 2012, pp. 224–232.

[2] J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in International Conference on Machine Learning (ICML), 2001, pp. 282–289.