Graph Based Evolutionary Algorithms - Advancing Artificial Intelligence Through Biological Process Applications

Biomedical Engineering Reference

In-Depth Information

Other Problems

A taxonomy is a hierarchical classification

of a set. Linnaeus established the first definite

hierarchy used to classify living organisms. Each

organism was assigned a kingdom, phylum, class,

order, family, genus, and species. This hierarchy

gave a tree-like structure to taxonomy, the en-

terprise of classifying living creatures. Modern

taxonomy of living organisms has nineteen levels

of classification, extending Linnaeus' original

seven. This can be displayed using a cladogram,

a tree diagram showing the evolutionary relation-

ship among various taxonomic groups (Mayr &

Ashlock, 1991).

Taking our cue from this taxonomy of living

organisms, we construct trees to find and visualize

relationships among members of other groups by

using a similar method on the distinctive charac-

teristics of those group members. Hierarchical

clustering can produce a tree-structured classi-

fication of any set of data given only a similarity

measure on members of the set and some sort of

averaging procedure for members of the set. Here

we extract a set of measurements or taxonomic

characters from our test problems and use hier-

archical clustering to produce a cladogram that

classifies the problems as more and less similar.

Hierarchical clustering starts with the members

of a set, thought of as singleton subsets. It then

joins the closest pair and replaces them with their

average in character space.

The choice of taxonomic characters used is

critical. Imagine, for example, that you are at-

tempting to derive a family tree for a group of

twenty insect species. All have six legs and so

“number of legs” is a useless character, at least

within the group, for classifying the insects. The

size of the insects varies a great deal with the

weather, which determines the amount of food

they can find. Thus, size is a bad character for

classification. The ratio of thorax length to ab-

domen length turns out both to vary within the

group and remain the same in different years. The

color of the larva also varies within the group

and does not depend on the weather. These lat-

Finding Steiner systems is another task examined

(Ashlock, Bryden, & Corns, 2005). Steiner sys-

tems are used in statistical design of experiments.

A k-tuple for a given set V of n objects has every

pair included in one but only one of the k -subsets.

For the set {A, B, C, D, E, F, G}, an example of a

Steiner 3-tuple (triple system) would be: {{A, B,

D}, {B, C, E}, {C, D, F}, {D, E, G}, {A, E, F}, {B,

F, G}, {A, C, G}}. Using difference sets to estab-

lish the length of the chromosome, k -tuples are

adjacent blocks of k integers in the chromosome.

Fitness is evaluated by summing the number of

distinct differences between members of the same

tuple, so that a repeated pair in a tuple does not

contribute to the fitness score. Six instances of

the Steiner problem ( S ( k , n )) were used: S(3,49),

S(3, 55), S(4,37), S(4,49), S(4,61) and S(5,41). For

labeling purposes, triple systems are denoted

STS n , quadruples SQS n and quintuples SQu n .

The final problem was the DNA barcode prob-

lem (Qiu, Guo, Wen, Ashlock, & Schnable, 2003).

DNA barcodes are used as embedded markers

for genetic constructs so they can be identified

when sequencing pooled genetic libraries. They

operate as an error correcting code over the DNA

alphabet {C, G, A, T} able to correct relative to the

edit metric (Gusfield, 1997). Fitness is the size of

the code located by Conway's lexicode algorithm

(Qiu et al., 2003). The algorithm studied finds

six-letter DNA words that are at a mutual distance

of at least three.

TAXONOMy

Early work in comparing different graph structures

in GBEAs showed that different problem types

preferred different families of graphs. Given the

large amount of data compiled in these studies, it

became apparent that in addition to solving similar

problems efficiently, these preferences could be

used to taxonomically classify problems.

Advancing Artificial Intelligence Through Biological Process Applications

Search WWH ::

Custom Search

Home