The Basic GEA in Problem Solving - Gene Expression Programming

Information Technology Reference

In-Depth Information

a testing set and the immediate evaluation of standard statistical parameters

such as the confusion matrix, the sensitivity, the specificity, the classifica-

tion error (the percent of incorrectly classified samples) and the classifica-

tion accuracy (the percent of samples correctly classified).

In one run a very good solution with a fitness of 957.0898 and a classifica-

tion error of 2.571% and a classification accuracy of 97.429% was found

(the genes are shown separately and a dot is used to separate the different

elements; the heads are shown in bold):

*.+.+.d8.d5.d0.NET .d6.d6.d2.d4.d0.d4.d5.d1

+.*.*.d6.d1.*.LOE .d5.d1.d8.d7.d2.d4.d8.d1

*.d1.d3.d0.GT./.* .d3.d3.d8.d4.d7.d4.d8.d6 (4.9a)

In terms of number of hits, this model classifies correctly 341 out of 350

fitness cases in the training set and 173 out of 174 in the testing set. This

corresponds to a testing set classification error of 0.575% and a classifica-

tion accuracy of 99.425%, even better than the classification accuracy on the

training set, what tells us that the model (4.9) above is indeed a very good

model for diagnosing breast cancer.

The confusion matrices obtained both for the training and the testing sets

are shown in Figure 4.7. With their help one can easily evaluate such impor-

tant parameters as the sensitivity, the specificity, the positive predictive value,

and the negative predictive value, all of them important in the medical field.

Thus, in the testing set, the sensitivity, evaluated by equation (3.11), is equal

to 98.462%; the specificity, evaluated by equation (3.12), is equal to 100%;

the PPV, evaluated by equation (3.15), is also 100%; and the NPV, evaluated

by equation (3.16), is equal to 99.091%.

Note that for the expression of this chromosome to be complete the 0/1

rounding threshold R = 0.5 must be taken into account. With Gepsoft APS

we can automatically convert the model (4.9) above into a fully expressed

computer program, such as the C++ function below:

int apsModel(double d[])

{

const double ROUNDING_THRESHOLD = 0.5;

double dblTemp = 0.0;

dblTemp = ((d[8]+d[5])*(d[0]+(d[6]!=d[6]?1:0)));

dblTemp += ((d[6]*d[1])+((d[5]*d[1])*(d[8]<=d[7]?1:0)));

dblTemp += (d[1]*d[3]);

return (dblTemp >= ROUNDING_THRESHOLD ? 1:0);

}

(4.9b)

Search WWH ::

Custom Search

Home