Information Technology Reference
In-Depth Information
R-square
The
R-square
fitness function is based on the standard R-square, which re-
turns the square of the Pearson product moment correlation coefficient. This
coefficient is a dimensionless index that ranges from -1 to 1 and reflects the
extent of a linear relationship between the predicted values and the target
values. When the Pearson correlation coefficient
R
i
equals 1, there is a per-
fect positive linear correlation between target
T
and predicted
P
values, that
is, they vary by the same amount. When
R
= -1, there is a perfect negative
linear correlation between
T
and
P
, that is, they vary in opposite ways (when
T
increases,
P
decreases by the same amount). When
R
= 0, there is no corre-
lation between
T
and
P
. Intermediate values describe partial correlations and
the closer to -1 or 1 the better the model.
The Pearson product moment correlation coefficient
R
i
of an individual
program
i
is evaluated by the equation:
§
·
§
·
n
n
n
¦
¦
¦
¨
©
¸
¹
¨
©
¸
¹
n
T
P
T
P
j
(
ij
)
j
(
ij
)
j
1
j
1
j
1
R
(3.6)
i
ª
2
º
ª
2
º
§
·
§
·
n
n
n
n
¦
¦
¦
¦
«
¬
2
¨
©
¸
¹
»
¼
«
¬
2
¨
©
¸
¹
»
¼
n
T
T
n
P
P
j
j
(
ij
)
(
ij
)
«
»
«
»
j
1
j
1
j
1
j
1
where
P
(
ij
)
is the value predicted by the individual program
i
for fitness case
j
(out of
n
fitness cases); and
T
j
is the target value for fitness case
j
.
The fitness
f
i
of an individual program
i
is a function of the squared corre-
lation coefficient (the so called R-square) and is expressed by the equation:
2
(3.7)
f
1000
R
i
i
and therefore ranges from 0 to 1000, with 1000 corresponding to the ideal.
3.2.3 Fitness Functions for Classification and Logic Synthesis
Although very different, classification and logic synthesis share one similar-
ity: their predictables or dependent variables are both binary and, conse-
quently, both these problems can use the same kind of fitness function to
evaluate the fitness of the evolved models. However, the vast majority of
fitness functions (and the most colorful, I might add) were originally de-
signed for classification problems, where it is usually not enough to just