Biomedical Engineering Reference
In-Depth Information
in a profile effectively doubles the resolution of the profiles. More sophisticated metrics
for comparing phylogenetic profiles have also been explored, including hypergeometric dis-
tribution [5] and mutual information [59]. Refer to Protocol 9.1 for information on how
to construct a phylogenetic profile using BLAST [60] provided an in-depth study on how
to select appropriate reference organisms for building effective phylogenetic profiles, and
suggested that reference organisms should (i) be sufficiently distant in evolution, (ii) cover
the Bacteria, Archaea and Eukarya domains [61] and (iii) be evenly distributed in the
fifth level in the evolutionary tree. Refer to Protocol 9.2 on how to create phylogenetic
profiles.
9.2.4.2 Phylogeny trees
While the use of a phylogenetic profile provided a simple means for identifying genes
with similar functions, it ignores the evolutionary history of organisms, which can have
a substantive impact on the effectiveness of the profiles in the identification of functional
linkages. For example, genes with profiles similar in the more distant organisms tend to be
more reflective of gene conservation due to evolutionary pressure, while genes with profiles
similar only in the less distant organisms may appear conserved simply because there is
not yet enough time for mutatations to accumulate substantially. To enable a more com-
prehensive comparison of the phylogenetic relationships between genes, Vert [62] proposed
comparing the evolutionary trees of genes instead of their phylogenetic profiles. Phyloge-
netic profiles reflect only the information associated with the leaves of a phylogenetic tree.
Figure 9.3 illustrates a hypothetical phylogenetic tree and the corresponding phylogenetic
profiles. While the presence or absence of the homologue of a gene in existing species is
known, this is unclear in ancestor species. To model this unknown information, Vert used a
Bayesian tree to model the probabilities of each gene existing in ancestor species based on
an existing phylogenetic tree [62]. A tree kernel is defined to efficiently compute the inner
product of the features representing the Bayesian trees of each pair of genes. The kernel
can then be used in a kernel-based method such as an SVM. An SVM can be used to build
two classifiers, one using a naive kernel based on the Euclidean distances between phylo-
genetic profiles and another using the tree kernel, which shows that the classifier using the
tree kernel achieved substantially superior performance. Some recent investigations further
explore evolutionary models for functional inference [63, 64].
S 1
S 2
S 3
S 4
S 5
S 6
S 7
S 8
G
G
...
G
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
...
...
...
...
...
...
...
...
0
0
0
1
0
0
0
1
Figure 9.3 A hypothetical phylogenetic tree for eight species and the corresponding profiles
for n genes.
Search WWH ::




Custom Search