Information Technology Reference
In-Depth Information
one can easily see why the authors in [6] reported such steady space coverage
problems for the estimated hyperspheres coverage of c 0 = 80 %. For the parame-
ter c 0 which was originally proposed in [5] one obtains according to [5,6] a sample
size of N =1 / (1
c 0 ) = 5. Evaluating term (7) with a given confidence level of
90 %, one obtains an integration error of greater than 65 %. Inequality (7) can
be used to explain the reported steady space coverage problems, however it does
not explain thoroughly the poor classification results described in [6] — this is
now explained by means of the results shown in sections 4 and 5.
Investigating the 41-dimensional data set [17], one can statistically verify 7 ,
that the whole normalized non-anomalous class is concentrated at one place in-
side the unitary hypercube
=[0 , 1] 41 . In [18] this characteristic is called“empty
space phenomenon” and arises in any data set that does not grow exponentially
with the dimension of the space.
In [6] the authors additionally reported, that the real-valued negative selec-
tion algorithm terminated when (on average) 1.4 detectors were generated. By
generating only one detector (hypersphere) with, for example, a radius r =3
and a detector center which does not necessarily lie inside
U
U
, the volume of that
=[0 , 1] 41 has a total
volume of 1, however most of the volume of a hypercube is concentrated in the
large corners, which themselves become very long “spikes”. This can be verified
by comparing the ratio of the distance n from the center of the hypercube
to one of the edges to the perpendicular distance a/ 2 to one of the edges (see
Fig. 3).
10 10 . The unitary hypercube
hypersphere amounts 5 . 11
·
U
n a 2
4 1 / 2
a
2
i =1 ( 2 ) 2 1 / 2
a
2
= n
=
where n is the dimension
(8)
For n →∞ ,theterm(8)goesto and therefore the volume is concentrated in
very long “spikes” of U .
As a consequence, the hypersphere covers some of those (high-volume) spikes
which are lying within the V fraction proportion of the hypersphere. Hence, the
real-valued negative selection algorithm terminates with only a very small num-
ber of large radii detectors (hyperspheres) which are covering a limited number
of spikes. As a result a large proportion of the volume of the hypercube does
not lie within the hyperspheres — it lies in the remaining (high-volume) spikes,
though the hypersphere volume is far higher than the hypercube volume.
These observations in combination with the unprecise volume integration of
overlapping hyperspheres results in the poor classification results reported in [6].
From our point of view, the real-valued negative selection would appear to
be a technique that is not well suited for high-dimensional data sets, i.e. data
dimensions far higher than 41 — a well established benchmark in the field of
pattern classification is for instance the problem of handwritten digit recognition,
the dimensionality of this problem domain is 256 [19,20]. We propose this is
in part because it makes more sense to formulate a classification model with
7 By means of covariance matrix.
Search WWH ::




Custom Search