Information Technology Reference
In-Depth Information
anomaly detection problem. The investigations observed that the poorest clas-
sification results were real-valued negative selection, when compared to the sta-
tistical anomaly detection techniques on a 41-dimensional problem set (see [6]
for further details). In this section, we attempt to explain this observation.
6.1 Real-Valued Negative Selection
The real-valued negative selection is an immune-inspired algorithm applied for
anomaly detection. Roughly speaking, immune negative selection is a process
which eliminates self-reactive lymphocytes and ensures that only those lympho-
cytes enter the blood stream that do not recognize self-cells 6 . As a consequence,
lymphocytes which survive the negative selection process, are capable of recog-
nizing nearly all foreign substances (like viruses, bacteria, etc.) which do not
belong to the body. Abstracting this principle and modeling immune compo-
nents according to the AIS framework [3] one obtains a technique for anomaly
detection :
- Input : S =setofpoints
[0 , 1] n gathered from normal behavior of a system.
- Output : D = set of hyperspheres, which recognizing a proportion c 0 of the
total space [0 , 1] n , except the normal points.
- Detector generation : While covered proportion c 0 is not reached, generate
hyperspheres.
- Classification : If unseen point lies within a hypersphere, it does not belong
to the normal behavior of the system and is classified as an anomaly.
A formal algorithmic description of real-valued negative selection is provided
in [6].
6.2 Poor Classification Results
In [6] the real-valued negative selection technique (see section 6.1) was bench-
marked by means of ROC analysis on a high-dimensional anomaly detection
problem. The authors reported a detection rate of approximately 1 %
2% and
a false alarm rate of 0 % when applying the real-valued negative selection algo-
rithm. The false alarm rate of 0 % can be explained by learning 100 % of the
training data and benchmarking with the training and testing data — similar
false alarm rates results on other benchmarked data sets are reported in [5,16].
Benchmarking with 100 % training and testing data should be avoided, as in
general it results in a high overfitted learning model and no representative (clas-
sification) results on the generalization performance will be obtained.
Moreover, the authors in [6] reported steady space coverage problems: these
can be explained also by lack of precision when estimating the volume integra-
tion. Using term (6), which gives the worst-case sample size when given , δ ,and
applying the inequality
>
1 / 2
1
4 δ 2
1
4 δ ( N +1)
N +1 >
⇐⇒
(7)
6 Cells which belongs to the body.
 
Search WWH ::




Custom Search