Information Technology Reference
In-Depth Information
The number of such classification trees grows very fast with the number of
objects in the classification, making it crucial to have an ecient method for
eliminating uninteresting options.
In this article we propose a novel methodology for constraining the number of
classification trees through allowing different features to be weighted with some
slack with respect to an average. Our methodology, which can be flexibly applied
to multiple uses both for linguistics and for biology, drastically diminishes the
number of alternatives to be considered, down to an acceptable number, by
adopting a constraint programming approach. It may handle arbitrary sets of
features, and uses an implicit distance measure while allowing for an arbitrary
flexibility in the relative weights of the features. The purpose of this approach
is to work dually, by finding either: a) the trees that are possible given some
bounds on the weights, or b) the bounds on the weights given constraints on
possible trees. This is achieved either by specifying allowed trees or simply the
constraints that stipulate the tree topology and discovering whether there are
sets of weights that make the distances compatible with such topologies.
Section 2 briefly describes the state of the art. Section 3 describes the mo-
tivation for this paper, the main features of Constraint Programming and, in
subsection 3, our phylogenetic tree model; section 4 shows the results obtained
for a biological and a linguistic example, and section 5 discusses our results.
2 Background, State-of-the-Art
Phylogenetic trees aim to represent the evolutionary relationships between enti-
ties, either qualitatively, by grouping together those that are more closely related,
or quantitatively, with each branch length indicating the distance to a common
ancestor, and the path through the branches of the tree the distance between
any two entities.
There are essentially two different ways of calculating phylogenetic trees. Dis-
tance methods rely on distance values between all pairs of entities to organize
in the tree [3]. The tree is then calculated such that the sum of the branches
joining any pair equals their corresponding distance. Commonly used algorithms
to build phylogenetic trees based on distance measures between the nodes (al-
most always the leaves of the tree) are the Unweighted Pair Group Method with
Arithmetic Mean (UPGMA) and the neighbor-joining method [8].
Distance methods tend to be the first choice, as sequence comparisons allow
for distances to be pre-computed. However, for some applications, it is necessary
to use sets of features from which a distance measure cannot be immediately
inferred.
3 Our Proposed Methodology
We use a declarative, CLP, algorithm for building the trees by specifying the
constraints that stem from the evolutionary relationships that the tree must
represent. Each inner node bisects the set of entities that descend from the
 
Search WWH ::




Custom Search