Analysis of Message Passing Algorithms and Free Energy Approximations in Probabilistic Graphical Models
- Status
- Finished
- Student
- Harald Leisenberger
- Mentor
- Franz Pernkopf, Christian Knoll
- Research Areas
Probabilistic graphical models are a powerful concept for representing complex relations between components in systems with uncertain behavior. Stemming originally from statistical mechanics, their applications stretch across various fields such as computer vision, speech recognition, social network analysis, and many more. Often they allow for a compact formulation of practical challenges in terms of fundamental computational problems, that are summarized under the term probabilistic inference. Unfortunately, these problems (e.g., the computation of marginal probabilities and the partition function) are computationally intractable so that we need to approximate the solution.
In this thesis we consider two different categories of deterministic approximation methods: message passing algorithms and (variational) free energy approximations. Specifically, we focus on the most important representatives from either class, which are loopy belief propagation on the one hand, and the Bethe free energy approximation on the other hand. Although sharing few obvious similarities, both methods are connected by a common theoretical background that allows for a joint analysis: the existence of a bijective mapping between fixed points of loopy belief propagation and stationary points of the Bethe free energy. Due to their efficiency, both methods are among the most investigated approaches to performing approximate inference; however, their varying performance and lack of accuracy guarantees make them unreliable for certain practical applications. A profound characterization of their properties with respect to convergence behavior and approximation quality is therefore highly desirable.
We make several theoretical contributions towards understanding the behavior of both methods. First, we analyze the convergence properties of loopy belief propagation, by focusing on the aspect of message initialization. We use the theory of Lyapunov functions to estimate the region of convergence of fixed points, and derive a sufficient condition for initial messages to converge to a specific fixed point. We also introduce the concept of robustness of fixed points (i.e., their stability under slight variations of the model parameters), and show how we can use this framework to measure their robustness. Second, we consider graphical model approximation as a ’preprocessing step’ for approximate inference. Specifically, we analyze the effect of edge removal on the behavior of loopy belief propagation and the Bethe approximation, by decomposing the Bethe free energy into a sum of edge-specific sub-functions. We prove that such an approach should not be applied if one aims to approximate the partition function, but show in our experiments that the effect on estimating marginals is often a positive one. Third, we analyze the reliability of the Bethe approximation; i.e., we ask under which conditions we can expect its local minima to be accurate estimates of the quantities of interest. For that purpose, we derive sufficient conditions for the convexity of the Bethe free energy on a specific submanifold of its domain, the Bethe box, and demonstrate by experiments that its solutions are accurate whenever these conditions are satisfied. We also compare our results to similar statements from the literature, and propose a quasi-Newton algorithm (BETHE-MIN) for efficient minimization of the Bethe free energy in practice.
After the theoretical analysis, we draw practical conclusions from our gained insights, by combining techniques of graphical model approximation with sufficient conditions for the reliability of the Bethe approximation. We discuss two approaches for generalizing the Bethe free energy and analyze how the governing parameters of these functions should ideally be chosen. Subsequently, we propose two algorithms for approximate inference, that automatically adapt their parameters to a specific model: ADAPT-zeta and ADAPT-c. We discuss benefits and potential drawbacks of either approach, and demonstrate by experiments when they are superior to alternative message passing algorithms and free energy approximations, without increasing the computational complexity. This includes the difficult problem of estimating the partition function in mixed Ising models where ADAPT-c outperforms other approaches by many orders of magnitude.