Information Technology Reference
In-Depth Information
anomaly detection problem. The investigations observed that the poorest clas-
sification results were real-valued negative selection, when compared to the sta-
tistical anomaly detection techniques on a 41-dimensional problem set (see [6]
for further details). In this section, we attempt to explain this observation.
6.1 Real-Valued Negative Selection
The real-valued negative selection is an immune-inspired algorithm applied for
anomaly detection. Roughly speaking, immune negative selection is a process
which eliminates self-reactive lymphocytes and ensures that
only
those lympho-
cytes enter the blood stream that do not recognize self-cells
6
. As a consequence,
lymphocytes which survive the negative selection process, are capable of recog-
nizing nearly all foreign substances (like viruses, bacteria, etc.) which do not
belong to the body. Abstracting this principle and modeling immune compo-
nents according to the AIS framework [3] one obtains a technique for anomaly
detection :
-
Input :
S
=setofpoints
[0
,
1]
n
gathered from normal behavior of a system.
-
Output :
D
= set of hyperspheres, which recognizing a proportion
c
0
of the
total space [0
,
1]
n
, except the normal points.
-
Detector generation : While covered proportion
c
0
is not reached, generate
hyperspheres.
-
Classification : If unseen point lies within a hypersphere, it does not belong
to the normal behavior of the system and is classified as an anomaly.
A formal algorithmic description of real-valued negative selection is provided
in [6].
∈
6.2 Poor Classification Results
In [6] the real-valued negative selection technique (see section 6.1) was bench-
marked by means of ROC analysis on a high-dimensional anomaly detection
problem. The authors reported a detection rate of approximately 1 %
2% and
a false alarm rate of 0 % when applying the real-valued negative selection algo-
rithm. The false alarm rate of 0 % can be explained by learning 100 % of the
training data and benchmarking with the training and testing data — similar
false alarm rates results on other benchmarked data sets are reported in [5,16].
Benchmarking with 100 % training and testing data should be avoided, as in
general it results in a high overfitted learning model and no representative (clas-
sification) results on the generalization performance will be obtained.
Moreover, the authors in [6] reported steady space coverage problems: these
can be explained also by lack of precision when estimating the volume integra-
tion. Using term (6), which gives the worst-case sample size when given
, δ
,and
applying the inequality
−
>
1
/
2
1
4
δ
2
1
4
δ
(
N
+1)
N
+1
>
⇐⇒
(7)
6
Cells which belongs to the body.