Classification Tree Generation Constrained with Variable Weights - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

These parameters constrain all trees to respect the distance constraints im-

posed by the tree metric, with weights that fall within the preset bounds, which

requires posting O ( n 3 ) ”less than” constraints, which are easily maintained by

a linear constraint solver. Under the constraint programming paradigm, most

trees are eliminated in the earlier stages of their top down recursive generation

if these constraints are impossible to satisfy.

Furthermore, given the declarative nature of our approach, the tree typology

may be completely or partially specified, in which case the system is used to test

whether there are attribute weights consistent with given phylogenetic relations.

Finally, for each tree that satisfies the constraints, a specific set of weight values

is calculated that minimizes the range of all weights.

4 Examples from Biology and Linguistics

To illustrate our approach, we give two application examples, as proof-of-concept

exercises. One is a study of the evolutionary relationships in Eocene primate fos-

sils, using two subsets of eight and nine fossil specimens from [9], characterized

by a selection of 26 features that had numerical values and were all defined for

these 15 fossil species. The other example is the phylogenetic classification of lan-

guages, using a set of six Latin languages, two Germanic languages and Basque,

characterizing each language with a set of twenty phonetic and grammatical

features.

4.1 Biological Examples

For these case studies we chose two different applications. In one case, we as-

sume that the correct phylogenetic tree is known from independent sources, but

we wish to assess the suitability of a set of features for grouping our entities

according to that tree. A real-life scenario where this may occur is in the clas-

sification of fossil specimens. For a set of fossils, the evolutionary relations may

be clear from geological and chronological considerations or because the fossil

specimens are particularly well preserved and rich in detail. However, in another

set of specimens the reverse may happen, and it may be necessary to assess if the

features that can be observed in this second group are sucient for computing

the phylogenetic tree and, if so, what should their relative weights be.

To illustrate this scenario, we considered a small subset of the primate fossil

data available from [9]. Fig. 1 shows the phylogenetic relations between species

in two parts of the published phylogenetic tree, which we assume to correctly

represent their phylogenetic relations. In the original paper, the authors used

a set of 360 fossil traits to classify a total of 117 species. For our prototype

implementation, this data set would be too complex, not only due to its size but

also because, for each species, it is often the case that a significant fraction of the

360 features are missing, since fossil quality varies significantly between diferent

specimens. So we selected these smaller subsets, A and B , of eight and nine

species, respectively, and used only 26 features for which the data was complete

in all 17 species, and which were identical in all of these species.

Search WWH ::

Custom Search

Home