Biology Reference
In-Depth Information
and the Bayesian Gaussian equivalent (BGe) score, the Wishart posterior density
of the network associated with a uniform prior over both the space of the network
structures and of the parameters of the local distributions ( Geiger and Heckerman ,
1994 ).
2.2.5 Parameter Learning
Once the structure of the network has been learned from the data, the task of esti-
mating and updating the parameters of the global distribution is greatly simplified
by the application of the Markov property.
Local distributions in practice involve only a small number of variables. Fur-
thermore, their dimension usually does not scale with the size of X and is often
assumed to be bounded by a constant when computing the computational com-
plexity of algorithms. This in turn alleviates the curse of dimensionality , because
each local distribution has a comparatively small number of parameters to estimate
from the sample and because estimates are more accurate due to the better ratio
between the size of parameter space and the sample size. There are two main ap-
proaches to the estimation of those parameters in literature: one based on maximum
likelihood estimation and the other based on Bayesian estimation .
The number of parameters needed to uniquely identify the global distribution,
which is the sum of the number of parameters of the local distributions, is also re-
duced because the conditional independence relationships encoded in the network
structure fix large parts of the parameter space. For example, in Gaussian Bayesian
networks, partial correlation coefficients involving (conditionally) independent vari-
ables are equal to zero by definition, and joint frequencies factorize into marginal
ones in multinomial distributions.
However, parameter estimation is still problematic in many situations. For exam-
ple, it is increasingly common to have sample sizes much smaller than the number
of variables included in the model. This is typical of high-throughput biological data
sets, such as microarrays, that have a few ten or hundred observations and thousands
of genes. In this setting, which is called “small n ,large p ,” estimates have a high
variability unless particular care is taken in both structure and parameter learnings
( Castelo and Roverato , 2006 ; Shafer and Strimmer , 2005 ; Hastie et al. , 2009 ).
2.2.6 Discretization
A simple way to learn Bayesian networks from mixed data is to convert all continu-
ous variables to discrete ones and then to apply the techniques described in the pre-
vious sections. This approach, which is called discretization or binning , completely
sidesteps the problem of defining a probabilistic model for the data. Discretization
Search WWH ::




Custom Search