Signal Processing and Speech Communication Laboratory
hometheses & projects › Model Self-Adaptation during Operation Using Mixture Data

Model Self-Adaptation during Operation Using Mixture Data

Status
Finished
Type
Master Thesis
Announcement date
17 Apr 2012
Student
Ludwig Mohr
Mentors
Research Areas

Short Description

Recently, we developed a probabilistic multipitch tracking approach based on factorial Hidden Markov Model (FHMMs) and speaker interaction models [1,2]. FHMMs for multipitch tracking (or Single Channel Source Separation) require clean source-specific data for model training, i.e. data without interfering sources and known speaker identity.

The aim of this thesis is to overcome this. In particular, we aim to develop an EM-like algorithm which starts with an universal (speaker independent) model. The algorithm is then iteratively adapting the source models using the currently available mixture data (e.g. an utterance of two concurrent speakers). The basic principle works as follows:

• E-step: Infer the pitch trajectories, given the current FHMM parameters (i.e. classical multipitch tracking).

• M-step: Adapt all single speaker models using maximum likelihood linear regression (MLLR) [3].

Your Profile/Requirements

  • The candidate should be interested in machine learning, applied mathematics/statistics, Matlab programming, and algorithms. Interested candidates are encouraged to ask for further information. Additionally, the supervision of own projects in one of the above mention fields is possible.

Contact:

Franz Pernkopf (pernkopf@tugraz.at or 0316/873 4436)

References

[1] M. Wohlmayr, “Probabilistic Model-Based Multiple Pitch Tracking of Speech”, PhD Thesis, Graz University of Technology, 2012.

[2] M. Wohlmayr, M.Stark, and F. Pernkopf, “A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 4, pp. 799-810, 2011.

[3] M.J.F. Gales and P.C. Woodland. Mean and variance adaptation within the MLLR framework. Computer, Speech & Language, 10:249–264, 1996.