Signal Processing and Speech Communication Laboratory
homephd theses › Maximum Margin Bayesian Networks

Maximum Margin Bayesian Networks

Sebastian Tschiatschek
Franz Pernkopf
Research Areas

    We consider Bayesian networks (BNs) with discriminatively optimized parameters and structures, i.e. BNs that are optimized to maximize a kind of probabilistic margin. These maximum margin Bayesian networks (MM BNs) are inspired by support vector machines (SVMs) that aim to separate samples from different classes by a large margin in some feature space. MM BNs achieve classification performance on par with BNs optimized according to other discriminative criteria, e.g. maximum conditional likelihood. Furthermore, in several applications, they achieve classification performance comparable to that of both, linear and kernelized, SVMs. In the literature, two definitions of MM BNs with respect to their parameters are available. We analyze these definitions in terms of asymptotic consistency, extend these definitions by a generative regularizer and analyze properties of MM BNs with respect to reduced-precision implementations.

    We start by analyzing the asymptotic consistency of MM BNs. Our analysis reveals a deficiency of MM BNs according to the definitions available in the literature. This deficiency renders MM BNs inconsistent in the multiclass case under certain conditions. However, this deficiency is not eminent in the binary-class case. In experiments, we demonstrate that MM BNs are nevertheless able to compensate for model-mismatch, i.e. the true data distribution and the distributions representable by the learned models do not match.

    By slightly altering the definitions existing in the literature, we extend MM BNs to include a generative norm-regularizer. This regularizer consists of normalized frequency counts and is balanced against a margin term. In this novel formulation, MM BNs can be interpreted as linear SVMs with a special regularizer or, alternatively, as BNs that are optimized generatively as well as discriminatively. This interpretation allows one to naturally deal with missing features scenarios, simply by marginalization of the missing features, and to perform semi-supervised learning — the unlabeled data influence the generative regularizer. State-of-the-art performance of the novel MM BN formulation is achieved in a large set of experiments and efficient algorithms for parameter learning in large scale scenarios are presented.

    Furthermore, we consider reduced-precision implementations of BNs. We especially focus on BNs optimized for a large margin, either in terms of their structure or their parameters. In preliminary experiments, we investigate the classification performance of BNs with parameters rounded to some specified precision. These experiments extend results from the literature and reveal that BNs used for classification are well suited for reduced-precision implementations. We continue by deriving several types of classification performance bounds for BNs. These bounds can be used to analyze worst-case classification performance upon parameter rounding. In experiments, these bounds are evaluated. Furthermore, BNs optimized for a large margin (in terms of their parameters and their structure) are compared to generatively optimized BNs in terms of robustness to parameter quantization and in terms of absolute classification performance. We extend our reduced-precision considerations by proposing an alternative to determining reduced-precision parameters for MM BNs by rounding. Therefore, we slightly modify our formulation of MM BNs and propose algorithms for maximizing this modified criterion over the search space of reduced-precision parameters. In several experiments, we demonstrate that parameters learned in this way yield better classification performance than parameters obtained by rounding. Finally, we end our investigations by considering parameter learning using reduced-precision computations only. Therefore, we propose specialized algorithms for generative and discriminative parameter learning and demonstrate their effectiveness in experiments.