Information Technology Reference
In-Depth Information
The graph in
Fig 7-9
plots the expected occurrence - EO(r,m) - denoted by
relation 4
in the
y
-axis, while the weight of patterns is plotted on
x
-axis as a
fraction of
n
- the number of bits in a pattern.
In
Fig 7
, where
m
= 1, it is seen that the expected occurrence doesn't follow a
monotonically decreasing function but reaches the peak at slightly higher weight
value. However, as
m
value is increased, (for
graph -8, m =2
), the function
becomes monotonically decreasing.
The gradient becomes steeper as the value of
m
is increased further (
Fig 9
).
In graph of
Fig 9
which is plotted for different values of
m
keeping
n
( = 30)
constant. It can be seen that the expectation of lower weight patterns occurring
in zero basin increases manifold.
5 Performance Analysis of
MACA
Based Classifier
For the sake of convenience of performance analysis, distributions of patterns in
two classes are assumed as shown in
Fig.10
. Each pair of sets on whom classifiers
are run are characterized by the curves (
a−a
,
b−b
,
c−c
,
d−d
). The ordinate of
the curves represents number of pairs of patterns having the specified hamming
distance. For example, at point A (on the curve for
a
) has
y
number of pairs
of patterns which are at hamming distance
x
. The abscissa has been plotted in
both direction, from left to right for
Class I
while from right to left for
Class II
.
The curves of
Class I
&
II
overlap if
D
min
<d
max
. An ideal distribution
a − a
is represented by the continuous line without any overlap of two classes.
In each distribution various values of
n
are taken. For each value of
n
, 2000
patterns are taken for each class. Out of this, 1000 patterns are taken from each
class to build up the classification model. The rest 1000 patterns are used to test
the prediction accuracy of the model. For each value of
n
, 10 different pairs of
pattern sets are built.
The
Table 1
represents the classification e
%
ciency of data set
a − a
,
b − b
,
c−c
.
Column II
represents the different values of
m
(number of attractor basins)
for which
GA
finds the best possible solution.
Column III
to
VI
represent the
classification e
%
ciency of training and test data set respectively. Classification
e
%
ciency of training set is the percentage of patterns which can be classified in
different attractors while that of test data implies the percentage of data which
can be correctly predicted. The best result of classification e
%
ciency correspond-
ing to each
m
in the final generation is taken. This is averaged over for the 10
different pairs of pattern set taken for each value of
n
.
The following experiments validate the theoretical foundations of the classi-
fier performance reported in earlier sections.
5.1
Expt 1: Study of GA Evolution
The
GA
starts with various values of
m
. But it soon begins to get concentrated in
certain zone of values. The genetic algorithm is allowed to evolve for 50 genera-
tions. In each case 80% of the population in the final solution assumes the two or