Information Technology Reference
In-Depth Information
two datasets were obtained from PROBEN1 - a set of neural network bench-
mark problems and benchmarking rules (Prechelt 1994). Both the technical
report and the datasets are available through anonymous FTP from the Neu-
ral Bench archive at Carnegie Mellon University (machine ftp.cs.cmu.edu ,
directory /afs/cs/project/connect/bench/contrib/prechelt ) and from the
machine ftp.ira.uka.de in directory /pub/neuron . The file name in both cases
is proben1.tar.gz . The last dataset, the iris dataset (Fisher 1936), is perhaps
the most famous dataset used in data mining and is also freely available online.
4.2.1 Diagnosis of Breast Cancer
In this diagnosis task the goal is to classify a tumor as either benign (0) or
malignant (1) based on nine different cell analysis (input attributes or termi-
nals) - clump thickness, uniformity of cell size, uniformity of cell shape,
marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin,
normal nucleoli, and mitoses.
The model presented here was obtained using the cancer1 dataset of
PROBEN1 where the binary 1-of- m encoding in which each bit represents
one of the m -possible output classes was replaced by a 1-bit encoding (“0”
for benign and “1” for malignant). The first 350 samples were used for train-
ing and the last 174 were used for testing the performance of the model in
real use. This means that absolutely no information from the testing set sam-
ples or the testing set performance are available during the adaptive process.
Thus, the classification error on the testing set will be used to evaluate the
generalization performance of the evolved models.
For this problem, F = {+, +, -, -, *, *, /, LT, GT, LOE, GOE, ET, NET} (the
last six functions are comparison functions of two arguments which return 1
if the condition is true or 0 if false, representing, respectively, less than,
greater than, less or equal to, greater or equal to, equal to, and not equal to);
the set of terminals consisted of the nine attributes used in this problem and
were represented by T = {d 0 , ..., d 8 } which correspond, respectively, to clump
thickness, uniformity of cell size, uniformity of cell shape, marginal adhe-
sion, single epithelial cell size, bare nuclei, bland chromatin, normal nu-
cleoli, and mitoses.
In classification problems where the output (the dependent variable) is
often binary, it is important to set criteria to convert predicted values (usu-
ally real-valued numbers) into zero or one. This is the 0/1 rounding threshold
R that converts the output of an individual program into 1 if the output is
Search WWH ::




Custom Search