In a natural acoustic environment, multiple sources are usually active at the same time. The task of source separation is the estimation of individual source signals from this complex mixture. The challenge of single channel source separation (SCSS) is to recover more than one source from a single observation. Basically, SCSS can be divided in methods that try to mimic the human auditory system and modelbased methods, which find a probabilistic representation of the individual sources and employ this prior knowledge for inference. This thesis presents several strategies for the separation of two speech utterances mixed into a single channel and is structured in four parts: The first part reviews factorial models in modelbased SCSS and introduces the softbinary mask for signal reconstruction. This mask shows improved performance compared to the soft and the binary masks in automatic speech recognition (ASR) experiments. The second part addresses the problem of computational complexity in factorial models, which limits its application for online processing. We introduce the fast beam search and the iterated conditional modes (ICM) approximation techniques. They reduce the computational complexity in factorial models by up to two orders of magnitude while maintaining the separation performance. Moreover, there is strong evidence that the ICM algorithm breaks the factorial structure entirely. Consequently, this leads to a linear complexity relationship in the number of hidden states instead of a factorial one. The third part deals with arbitrary mixing levels in factorial models by explicitly modeling the gain for each speech segment, which results in a shapegain model. Several strategies for parallel estimation of gain and shape are successfully evaluated. Finally, the last part integrates the speech model in modelbased systems. This results in a sourcefilter representation, where the source signal can be linked to the excitation signal of the vocal folds and the filter accounts for the vocaltract shaping. Our final separation algorithm combines the shapegain with the sourcefilter model, reflecting the complete standard speech production model. All presented algorithms are compared to stateoftheart algorithms and evaluated in both, the targettomasker ratio and the word error rate of an ASR system and show improvements beyond the stateoftheart. 
