Information Technology Reference
In-Depth Information
Fig. 1. The two groups of fossil primate species, A and B , and their respective phyloge-
netic trees used in our bioinformatics examples. The rightmost panel shows the uniform
weights tree that can be calculated from a subset of 5 elements from A. Original data
from [9]. Tree representations drawn with T-Rex [6].
The third panel of Fig. 1 shows the tree obtained assuming that all features
have the same weight. This uniform-weight tree is only possible for a subset of 5
species from the A set, as the full set of 8 species from A cannot be organized into
a uniform-weight tree, with these 26 features that we used. However, even with
only 5 elements, the tree produced with uniform weights is incorrect, grouping
T. propliopithecid with P. haeckeli . To reproduce the correct trees for A , B
and event the 5 species subset of A , it is necessary to allow feature weights
to vary considerably. Whereas with a uniform weight distribution each feature
contributes 3.85% of the total, the correct tree for A requires weights to vary
between 0.84% and 5.11%. The tree for the B group is even more demanding,
with four features needing to be above 14% of relative weight each, while eleven
have a weight of 0%, meaning they must be completely discarded for the tree to
be possible. There are not unique solutions to the problem of building a tree, but
they are the solutions found by minimizing the range of relative weights assigned
to features, and thus are the solutions closest to the uniform-weight metric, in
this sense.
If species are suciently related, it is possible to extrapolate from the weights
of one to the other. For example, the weights used to form the correct tree for the
subset of 5 species of A have a correlation of 0.73 to the weights in the correct
tree of the full A set, and many are quite similar, as shown in Fig. 2.
If the groups are too different, this extrapolation is not possible. When com-
paring the B and A groups, which were chosen from different regions of the
larger tree containing all 117 fossil species reported in [9], the correlation of fea-
ture weights is merely 0.3, and the scores differ markedly between the two trees.
However, if only two species from B are added to A , for a ten-species tree, this
tree can be correctly computed with only small differences in feature weights,
and a correlation of 0.72. Fig. 3 shows this comparison.
Though these are still somewhat preliminary results, they show two important
aspects of this problem. First, it is often impossible to compute the correct phy-
logenetic trees if one has incomplete data and assumes uniform weights across all
features. In fossil analysis, rich datasets like those in [9] are the exception rather
 
Search WWH ::




Custom Search