The Basic GEA in Problem Solving - Gene Expression Programming

Information Technology Reference

In-Depth Information

two datasets were obtained from PROBEN1 - a set of neural network bench-

mark problems and benchmarking rules (Prechelt 1994). Both the technical

report and the datasets are available through anonymous FTP from the Neu-

ral Bench archive at Carnegie Mellon University (machine ftp.cs.cmu.edu ,

directory /afs/cs/project/connect/bench/contrib/prechelt ) and from the

machine ftp.ira.uka.de in directory /pub/neuron . The file name in both cases

is proben1.tar.gz . The last dataset, the iris dataset (Fisher 1936), is perhaps

the most famous dataset used in data mining and is also freely available online.

4.2.1 Diagnosis of Breast Cancer

In this diagnosis task the goal is to classify a tumor as either benign (0) or

malignant (1) based on nine different cell analysis (input attributes or termi-

nals) - clump thickness, uniformity of cell size, uniformity of cell shape,

marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin,

normal nucleoli, and mitoses.

The model presented here was obtained using the cancer1 dataset of

PROBEN1 where the binary 1-of- m encoding in which each bit represents

one of the m -possible output classes was replaced by a 1-bit encoding (“0”

for benign and “1” for malignant). The first 350 samples were used for train-

ing and the last 174 were used for testing the performance of the model in

real use. This means that absolutely no information from the testing set sam-

ples or the testing set performance are available during the adaptive process.

Thus, the classification error on the testing set will be used to evaluate the

generalization performance of the evolved models.

For this problem, F = {+, +, -, -, *, *, /, LT, GT, LOE, GOE, ET, NET} (the

last six functions are comparison functions of two arguments which return 1

if the condition is true or 0 if false, representing, respectively, less than,

greater than, less or equal to, greater or equal to, equal to, and not equal to);

the set of terminals consisted of the nine attributes used in this problem and

were represented by T = {d 0 , ..., d 8 } which correspond, respectively, to clump

thickness, uniformity of cell size, uniformity of cell shape, marginal adhe-

sion, single epithelial cell size, bare nuclei, bland chromatin, normal nu-

cleoli, and mitoses.

In classification problems where the output (the dependent variable) is

often binary, it is important to set criteria to convert predicted values (usu-

ally real-valued numbers) into zero or one. This is the 0/1 rounding threshold

R that converts the output of an individual program into 1 if the output is

Search WWH ::

Custom Search

Home