Biology Reference
In-Depth Information
class label to distinguish them from samples in other classes. A cluster is
a collection of objects that are similar locally. Clusters are usually gener-
ated in order to further classify objects into relatively larger and mean-
ingful categories. Clustering is also called unsupervised classification,
where no predefined classes are assigned.
According to a data set with class labels, data analysis builds classifiers
as predictors for future unknown objects. A classification model is formed
first based on available data. Future trends are predicted using the learned
model. In the following case, the data sets used are from a public microar-
ray database and the samples are collected to build a model that can be
used to classify new samples into categories of ALL or AML for leukemia.
Classification of acute leukemia, having highly similar appearance in
gene expression data, has been made by combining a pair of classifiers
trained with mutually exclusive features (Cho and Ryu, 2002). Gene expres-
sion profiles were constructed from 71 patients having acute lymphoblastic
leukemia (ALL) or acute myeloid leukemia (AML), each constituting one
sample of the DNA microarray. Each pattern consists of 7129 gene expres-
sions. Feature selection was employed to generate the 25 top-ranked genes
for the experiment. A case study from theory to practice is presented in
detail in the following sections.
4.4.1. Genetic algorithms (GAs)
GAs are motivated by the natural evolutionary process. Most of the clas-
sification techniques with artificial intelligence use GAs as core algo-
rithms. Solutions of the problem at hand are encoded in chromosomes or
individuals. An initial population of individuals is generated at random or
heuristically. The operators in GAs include selection, crossover, and
mutation. To generate a new generation, chromosomes are selected
according to their fitness score. The selection operator gives preference to
better individuals as parents for the next generation. The crossover oper-
ator and the mutation operator are used to generate offspring from the par-
ents. A crossover site is randomly chosen in the parents. The mutation
operator is used to prevent premature convergence to local optima (Wang
and Fu, 2005). The basic concept in GAs is to introduce effective parallel
searching in the high-dimensional problem space.
Search WWH ::




Custom Search