Information Technology Reference
In-Depth Information
of holes (generalization regions). More specifically, we will empirically explore
if holes occur in suitable generalization regions when a randomly determined
permutation mask is applied. Finally, we explore empirically whether randomly
determined permutation masks reduce the number of holes.
Stibor et al. [8] have shown in prior experiments that the matching thresh-
old r is a crucial parameter and is inextricably linked to the input data being
analyzed. However, permutation masks were not considered in [8]. In order to
study the impact of permutation masks on generalization regions, and to obtain
comparable results to previously performed experiments [8], we will utilize the
same mapping function and data set. Furthermore, we will explore the impact
of permutation masks on an additional data set (see Fig. 4).
5.1
Experiments Settings
The first self data set contains 1000 Gaussian ( μ =0 . 5 =0 . 1) generated points
p =( x, y )
[0 , 1] 2 . Each point p is mapped to a binary string
b 1 ,b 2 ,...,b 8
,b 9 ,b 10 ,...,b 16
,
b x
b y
where the first 8 bits encode the integer x -value i x :=
255
·
x +0 . 5
and the last
8 bits the integer y-value i y :=
255
·
y +0 . 5
, i.e.
U { 0 , 1 }
8
U { 0 , 1 }
8
[0 , 1] 2
( i x ,i y )
[1 ,..., 256
×
1 ,..., 256]
( b x ,b y )
×
This mapping is proposed in [18] and also utilized in [8] — it satisfies a straightfor-
ward visualization of real-valued encoded points in Hamming negative selection.
The second data set (termed banana data set) is depicted in figure (4) and is a com-
monly used benchmark for anomaly detection problems [19]. The banana data set
is taken from [20] and consists of 5300 points in total. These points are partitioned
in two different classes,
C + which represents points inside the “banana-shape”and
class
C which contains points outside of the“banana-shape”. In this experiment we
have taken points from
C + only for simulating one self-region (similar to figure 1).
More specifically, we have normalized with min-max method all points from
C +
to the unitary square [0 , 1] 2 . We then sampled 1000 random points from
C + and
mapped those sampled points to bit-strings of length 16.
As the r -chunk matching rule subsumes the r -contiguous rule, i.e. recognize
at least as many elements as the r -contiguous matching rule (see section 2.2), we
have performed all experiments with the r -chunk matching rule. Furthermore,
as proposed in [3,9] we have randomly determined permutation masks π
S 16 .
5.2
Experimental Results
In figures (5,6,7,8) experimental results are presented. The black points represent
the 1000 sampled self elements, the white points are holes, and the grey points
represent areas which are covered by r -chunk detectors. It is not surprising that
Search WWH ::




Custom Search