Biomedical Engineering Reference
In-Depth Information
ter two characters may be useful. One is a real
number, the other is a class variable taking on
the values red, white, orange, or yellow. If we
are performing a numerical taxonomic analysis,
we will need to assign numbers in a somewhat
arbitrary fashion to the colors. The preceding
brief discussion gives only a taste of the dif-
ficulty of choosing good taxonomic characters.
Readers familiar with automatic classification,
decision trees, and related branches of machine
learning will recognize that those fields also face
similar issues choosing decision variables. Any
taxonomic character or decision variable must be
relevant to the decision being made, vary across
the set of objects being classified, and be cleanly
computable for all members of the set of objects
being classified.
GBEAs yield a source of taxonomic characters
that are computable for any evolutionary compu-
tation problem that has a detectable solution or
end point. These characters are numerical. These
characters are objective in the sense that they do
not favor any particular choice of representation
or parameter setting. In outline, these characters
are computed in the following fashion. The time-
to-solution for a problem varies in a complex
manner with the choice of graphical connection
topology. This complexity is the genesis of our
taxonomic characters. The taxonomic characters
used to describe a problem are the normalized
mean solution times for the problem on each of
a variety of graphs. While this presents a set of
objective characters that enable automatic clas-
sification, these are not necessarily the “right”
or only characters. Using results from the 40
problems described, the mean number of mating
events to solution were normalized to yield the
taxonomic characters for the problems. Normal-
ization consisted of subtracting the minimum
average time from each average time and then
dividing through by the maximum among the
resulting reduced times. The taxonomic charac-
ters for each problem are thus numbers in the set
[0, 1] for each graph. In this way a method for
comparing the similarity of the problems was
introduced with the problem's relative hardness
removed. This information can then be used to
construct a cladogram using neighbor-joining
or displayed in two-dimensional space using
nonlinear projection. It should be noted that the
randomized graphs were indistinguishable from
the graphs they were derived from, and so were not
used in the taxonomy process. Also, the random
toroid graphs had similar behaviors, so only one
instance was used for classification, making the
total number of graphs used 15.
Neighbor-Joining Taxonomy
For each the 40 problems examined (P), a real
vector m(P) was constructed using the normalized
mean solution times for the 15 graphs examined.
For each pair of problems P and Q, the Euclidean
distance d(P,Q) between the vectors m(P) and
m(Q) was computed using the formula:
23
[
]
=
(
) (
)
2
d
(
P
,
Q
)
=
m
P
,
G
m
Q
,
G
i
i
i
1
d(P,Q) was interpreted as the distance between
the problems P and Q.
To transform this distance data into a clado-
gram (or tree), clustering was done using UPGMA
(Unweighted Pair Group Method with Arithmetic
mean) (Sneath & Sokal, 1973), a method which is
especially reliable when the distances examined
have a uniform meaning. Given a collection of
taxa and distance dij between taxa i and j, the
method starts with the two taxa x and y that are
the least distant and merges them into a new
unit z representing the average of x and y. For
all taxa other than x and y, a new distance from
z (diz) is then computed using the averages of
their distance from x and y, completely replacing
x and y with z. This procedure is then repeated
for the next set of least distant pairs until the last
two taxa are merged, weighting the computed
Search WWH ::




Custom Search