Bayesian Networks in the Absence of Temporal Information - Bayesian Networks in R: With Applications in Systems Biology

Biology Reference

In-Depth Information

and the Bayesian Gaussian equivalent (BGe) score, the Wishart posterior density

of the network associated with a uniform prior over both the space of the network

structures and of the parameters of the local distributions ( Geiger and Heckerman ,

1994 ).

2.2.5 Parameter Learning

Once the structure of the network has been learned from the data, the task of esti-

mating and updating the parameters of the global distribution is greatly simplified

by the application of the Markov property.

Local distributions in practice involve only a small number of variables. Fur-

thermore, their dimension usually does not scale with the size of X and is often

assumed to be bounded by a constant when computing the computational com-

plexity of algorithms. This in turn alleviates the curse of dimensionality , because

each local distribution has a comparatively small number of parameters to estimate

from the sample and because estimates are more accurate due to the better ratio

between the size of parameter space and the sample size. There are two main ap-

proaches to the estimation of those parameters in literature: one based on maximum

likelihood estimation and the other based on Bayesian estimation .

The number of parameters needed to uniquely identify the global distribution,

which is the sum of the number of parameters of the local distributions, is also re-

duced because the conditional independence relationships encoded in the network

structure fix large parts of the parameter space. For example, in Gaussian Bayesian

networks, partial correlation coefficients involving (conditionally) independent vari-

ables are equal to zero by definition, and joint frequencies factorize into marginal

ones in multinomial distributions.

However, parameter estimation is still problematic in many situations. For exam-

ple, it is increasingly common to have sample sizes much smaller than the number

of variables included in the model. This is typical of high-throughput biological data

sets, such as microarrays, that have a few ten or hundred observations and thousands

of genes. In this setting, which is called “small n ,large p ,” estimates have a high

variability unless particular care is taken in both structure and parameter learnings

( Castelo and Roverato , 2006 ; Shafer and Strimmer , 2005 ; Hastie et al. , 2009 ).

2.2.6 Discretization

A simple way to learn Bayesian networks from mixed data is to convert all continu-

ous variables to discrete ones and then to apply the techniques described in the pre-

vious sections. This approach, which is called discretization or binning , completely

sidesteps the problem of defining a probabilistic model for the data. Discretization

Search WWH ::

Custom Search

Home