A Novel Method for DNA Microarray Data Analysis: SDL Global Optimization Method - DNA Microarray Technology and Data Analysis in Cancer Research

Biology Reference

In-Depth Information

class label to distinguish them from samples in other classes. A cluster is

a collection of objects that are similar locally. Clusters are usually gener-

ated in order to further classify objects into relatively larger and mean-

ingful categories. Clustering is also called unsupervised classification,

where no predefined classes are assigned.

According to a data set with class labels, data analysis builds classifiers

as predictors for future unknown objects. A classification model is formed

first based on available data. Future trends are predicted using the learned

model. In the following case, the data sets used are from a public microar-

ray database and the samples are collected to build a model that can be

used to classify new samples into categories of ALL or AML for leukemia.

Classification of acute leukemia, having highly similar appearance in

gene expression data, has been made by combining a pair of classifiers

trained with mutually exclusive features (Cho and Ryu, 2002). Gene expres-

sion profiles were constructed from 71 patients having acute lymphoblastic

leukemia (ALL) or acute myeloid leukemia (AML), each constituting one

sample of the DNA microarray. Each pattern consists of 7129 gene expres-

sions. Feature selection was employed to generate the 25 top-ranked genes

for the experiment. A case study from theory to practice is presented in

detail in the following sections.

4.4.1. Genetic algorithms (GAs)

GAs are motivated by the natural evolutionary process. Most of the clas-

sification techniques with artificial intelligence use GAs as core algo-

rithms. Solutions of the problem at hand are encoded in chromosomes or

individuals. An initial population of individuals is generated at random or

heuristically. The operators in GAs include selection, crossover, and

mutation. To generate a new generation, chromosomes are selected

according to their fitness score. The selection operator gives preference to

better individuals as parents for the next generation. The crossover oper-

ator and the mutation operator are used to generate offspring from the par-

ents. A crossover site is randomly chosen in the parents. The mutation

operator is used to prevent premature convergence to local optima (Wang

and Fu, 2005). The basic concept in GAs is to introduce effective parallel

searching in the high-dimensional problem space.

Search WWH ::

Custom Search

Home