Biology Reference
In-Depth Information
Table 10.1.
Paraclique Parameter Variation
Glom
Number of
Clique
Paraclique
Edge
Lowest Edge
Factor
Paracliques
Size
Size
Density
Density
|C|− 1
32
99.4
104.8
99.8%
99.5%
|C|− 2
30
99.9
118.8
99.0%
97.9%
|C|− 3
28
101.6
137.4
97.8%
96.0%
|C|− 4
27
101.4
151.4
96.4%
92.3%
|C|− 5
24
106.1
173.8
94.9%
90.3%
|C|− 6
24
104.7
186.8
92.9%
86.7%
|C|− 7
22
108.5
205.7
91.4%
83.1%
|C|− 8
21
110.2
221.1
90.0%
80.0%
|C|− 9
21
109.3
231.1
88.6%
77.9%
|C|− 10
19
114.7
250.5
87.7%
76.6%
of genes (in our case Affymetrix probesets) against a manually curated biologi-
cal interaction database. Probe sets known by the database are mapped to genes,
which are then termed focus genes . Other probe sets are ignored. Focus genes are
analyzed to determine how they are connected to one another based on evidence
from the biomedical literature. Based on this analysis, one or more molecular net-
works are produced. Each typically consists of a mixture of focus genes, sprinkled
with additional database genes and gene products that are needed to connect the
focus genes and complete the network. We term a focus gene that is placed in such
anetworka focus gene utilized . In general, one cannot expect that all focus genes
will become members of a network. The database may have very little information
about a focus gene's connectivity. Alternately, a focus gene may be only distantly
related to other focus genes. Due to technical constraints, IPA imposes a limit on
network size, which is currently set to 35 nodes. As a result, lists with large num-
bers of focus genes often create multiple networks. Fortunately, these can often
be fused together into a single common network using commands that are avail-
able on the Ingenuity website and that are designed for this purpose. The more
closely connected a group of focus genes are biologically, the more likely it is that
the database can connect them all into a network. Thus, an important metric is
the percent focus genes utilized . This number alone can be misleading, however,
because we must bear in mind that IPA may spread the genes across more than
one network. A group of 40 focus genes, for example, would be considered more
closely related if they could be connected in two networks than if four networks
are needed to connect them all. We will therefore also calculate and examine focus
genes utilized per network , a metric that normalizes for this effect.
As a control, we also tested K -means clustering, a traditional and highly popu-
lar algorithm. We invoked it via the R programming language, with the “kmeans”
function from the “amap” package [15]. Input values were log transformed. Pear-
Search WWH ::




Custom Search