Biomedical Engineering Reference
In-Depth Information
Other Problems
A taxonomy is a hierarchical classification
of a set. Linnaeus established the first definite
hierarchy used to classify living organisms. Each
organism was assigned a kingdom, phylum, class,
order, family, genus, and species. This hierarchy
gave a tree-like structure to taxonomy, the en-
terprise of classifying living creatures. Modern
taxonomy of living organisms has nineteen levels
of classification, extending Linnaeus' original
seven. This can be displayed using a cladogram,
a tree diagram showing the evolutionary relation-
ship among various taxonomic groups (Mayr &
Ashlock, 1991).
Taking our cue from this taxonomy of living
organisms, we construct trees to find and visualize
relationships among members of other groups by
using a similar method on the distinctive charac-
teristics of those group members. Hierarchical
clustering can produce a tree-structured classi-
fication of any set of data given only a similarity
measure on members of the set and some sort of
averaging procedure for members of the set. Here
we extract a set of measurements or taxonomic
characters from our test problems and use hier-
archical clustering to produce a cladogram that
classifies the problems as more and less similar.
Hierarchical clustering starts with the members
of a set, thought of as singleton subsets. It then
joins the closest pair and replaces them with their
average in character space.
The choice of taxonomic characters used is
critical. Imagine, for example, that you are at-
tempting to derive a family tree for a group of
twenty insect species. All have six legs and so
“number of legs” is a useless character, at least
within the group, for classifying the insects. The
size of the insects varies a great deal with the
weather, which determines the amount of food
they can find. Thus, size is a bad character for
classification. The ratio of thorax length to ab-
domen length turns out both to vary within the
group and remain the same in different years. The
color of the larva also varies within the group
and does not depend on the weather. These lat-
Finding Steiner systems is another task examined
(Ashlock, Bryden, & Corns, 2005). Steiner sys-
tems are used in statistical design of experiments.
A k-tuple for a given set V of n objects has every
pair included in one but only one of the k -subsets.
For the set {A, B, C, D, E, F, G}, an example of a
Steiner 3-tuple (triple system) would be: {{A, B,
D}, {B, C, E}, {C, D, F}, {D, E, G}, {A, E, F}, {B,
F, G}, {A, C, G}}. Using difference sets to estab-
lish the length of the chromosome, k -tuples are
adjacent blocks of k integers in the chromosome.
Fitness is evaluated by summing the number of
distinct differences between members of the same
tuple, so that a repeated pair in a tuple does not
contribute to the fitness score. Six instances of
the Steiner problem ( S ( k , n )) were used: S(3,49),
S(3, 55), S(4,37), S(4,49), S(4,61) and S(5,41). For
labeling purposes, triple systems are denoted
STS n , quadruples SQS n and quintuples SQu n .
The final problem was the DNA barcode prob-
lem (Qiu, Guo, Wen, Ashlock, & Schnable, 2003).
DNA barcodes are used as embedded markers
for genetic constructs so they can be identified
when sequencing pooled genetic libraries. They
operate as an error correcting code over the DNA
alphabet {C, G, A, T} able to correct relative to the
edit metric (Gusfield, 1997). Fitness is the size of
the code located by Conway's lexicode algorithm
(Qiu et al., 2003). The algorithm studied finds
six-letter DNA words that are at a mutual distance
of at least three.
TAXONOMy
Early work in comparing different graph structures
in GBEAs showed that different problem types
preferred different families of graphs. Given the
large amount of data compiled in these studies, it
became apparent that in addition to solving similar
problems efficiently, these preferences could be
used to taxonomically classify problems.
Search WWH ::




Custom Search