Biology Reference
In-Depth Information
may also be applied to deal with continuous data when one or more variables present
severe departures from normality (skewness, heavy tails, etc.).
The intervals the variables will be discretized into can be chosen in one of the
following ways:
Using prior knowledge on the data. The boundaries of the intervals are defined,
for each variable, to correspond to significantly different real-world scenarios,
such as the concentration of a particular pollutant (absent, dangerous, lethal) or
age classes (child, adult, elderly).
Using heuristics before learning the structure of the network. Some examples are
Sturges, Freedman-Diaconis, or Scott rules ( Venables and Ripley , 2002 ).
Choosing the number of intervals and their boundaries to balance accuracy and
information loss ( Kohavi and Sahami , 1996 ), again one variable at a time and
before the network structure has been learned. A similar approach considering
pairs of variables is presented in Hartemink ( 2001 ).
Performing learning and discretization iteratively until no improvement is made
( Friedman and Goldszmidt , 1996 ).
These strategies represent different trade-offs between the accuracy of the discrete
representation of the original data and the computational efficiency of the transfor-
mation.
2.3 Static Bayesian Networks Modeling with R
In this section, we demonstrate structure learning, parameter learning, and manip-
ulation of a static Bayesian network in the R environment. Several of the packages
introduced in Sect. 2.3.1 will be covered to provide an overview of the possibilities
offered by the R environment. All code will be illustrated using a very simple data
set and explained step by step to develop a throughout understanding of Bayesian
network learning.
2.3.1 Popular R Packages for Bayesian Network Modeling
There are several packages on CRAN dealing with Bayesian networks. They can
be divided in two categories: those that deal with structure learning and those that
focus only on parameter learning and inference (Table 2.1 ).
Packages bnlearn ( Scutari , 2010 , 2012 ), deal ( Bøttcher and Dethlefsen , 2003 ),
pcalg ( Kalisch et al. , 2012 ), and catnet ( Balov and Salzman , 2012 ) fall into the
first category. bnlearn offers a wide variety of structure learning algorithms (span-
ning all the three classes covered in this chapter, with the tests and scores covered
in Sect. 2.2.4 ), parameter learning approaches (maximum likelihood for discrete
and continuous data, Bayesian estimation for discrete data), and inference tech-
Search WWH ::




Custom Search