Information Technology Reference
In-Depth Information
These parameters constrain all trees to respect the distance constraints im-
posed by the tree metric, with weights that fall within the preset bounds, which
requires posting O ( n 3 ) ”less than” constraints, which are easily maintained by
a linear constraint solver. Under the constraint programming paradigm, most
trees are eliminated in the earlier stages of their top down recursive generation
if these constraints are impossible to satisfy.
Furthermore, given the declarative nature of our approach, the tree typology
may be completely or partially specified, in which case the system is used to test
whether there are attribute weights consistent with given phylogenetic relations.
Finally, for each tree that satisfies the constraints, a specific set of weight values
is calculated that minimizes the range of all weights.
4 Examples from Biology and Linguistics
To illustrate our approach, we give two application examples, as proof-of-concept
exercises. One is a study of the evolutionary relationships in Eocene primate fos-
sils, using two subsets of eight and nine fossil specimens from [9], characterized
by a selection of 26 features that had numerical values and were all defined for
these 15 fossil species. The other example is the phylogenetic classification of lan-
guages, using a set of six Latin languages, two Germanic languages and Basque,
characterizing each language with a set of twenty phonetic and grammatical
features.
4.1 Biological Examples
For these case studies we chose two different applications. In one case, we as-
sume that the correct phylogenetic tree is known from independent sources, but
we wish to assess the suitability of a set of features for grouping our entities
according to that tree. A real-life scenario where this may occur is in the clas-
sification of fossil specimens. For a set of fossils, the evolutionary relations may
be clear from geological and chronological considerations or because the fossil
specimens are particularly well preserved and rich in detail. However, in another
set of specimens the reverse may happen, and it may be necessary to assess if the
features that can be observed in this second group are sucient for computing
the phylogenetic tree and, if so, what should their relative weights be.
To illustrate this scenario, we considered a small subset of the primate fossil
data available from [9]. Fig. 1 shows the phylogenetic relations between species
in two parts of the published phylogenetic tree, which we assume to correctly
represent their phylogenetic relations. In the original paper, the authors used
a set of 360 fossil traits to classify a total of 117 species. For our prototype
implementation, this data set would be too complex, not only due to its size but
also because, for each species, it is often the case that a significant fraction of the
360 features are missing, since fossil quality varies significantly between diferent
specimens. So we selected these smaller subsets, A and B , of eight and nine
species, respectively, and used only 26 features for which the data was complete
in all 17 species, and which were identical in all of these species.
Search WWH ::




Custom Search