Biology Reference
In-Depth Information
attributes that affect the system, such as experimental conditions, temporal indica-
tors, and exogenous cellular conditions. As a result, we can model simultaneously
the biological mechanisms we are interested in and the external conditions influenc-
ing them in a single, comprehensive network.
2.5.1 Model Averaging
Consider, for example, the protein signaling data studied in Sachs et al. ( 2005 ). The
data consist in the simultaneous measurements of 11 phosphorylated proteins and
phospholypids derived from thousands of individual primary immune system cells,
subjected to both general and specific molecular interventions. The former ensure
that the relevant signaling pathways are active, while the latter make causal infer-
ence possible by elucidating arc directions through stimolatory cues and inhibitory
interventions.
The analysis performed in Sachs et al. ( 2005 ) can be summarized as follows:
1. Outliers were removed and the data were discretized using the approach de-
scribed in Hartemink ( 2001 ), because the distributional assumptions required by
Gaussian Bayesian networks were unlikely to hold.
2. Structure learning was repeated several times. In this way, a larger number of net-
work structures were explored in an effort to reduce the impact of locally optimal
(but globally suboptimal) networks on learning and subsequent inference.
3. The networks learned in the previous step were averaged to produce a more
robust model. This practice, known as model averaging ( Claeskens and Hjort ,
2008 ), is known to result in a better predictive performance than choosing a sin-
gle, high-scoring network. The averaged network structure was created using
the arcs present in at least 85 % of the networks. This proportion measures the
strength of each arc and provides the means to establish its significance given a
threshold (85 % in this case).
4. The validity of the averaged network was evaluated using connections well-
established in literature as a reference.
All these steps can be performed using the bnlearn package, and some of the other
packages covered in Sect. 2.3.1 can also be used by integrating missing function-
ality. For the moment, we will consider only the data manipulated with general
interventions (i.e., the observational data); we will investigate the complete data set
(i.e., both the observational and the interventional data) in Sect. 2.5.3 .
First of all, we will discretize the data with the discretize function, which
implements some common discretization methods including Hartemink's.
> library(bnlearn)
> sachs = read.table("sachs.data.txt", header = TRUE)
> dsachs = discretize(sachs, method = "hartemink",
+
breaks = 3, ibreaks = 60,
+
idisc = "quantile")
Search WWH ::




Custom Search