Information Technology Reference
In-Depth Information
number of false positives allowable. The goal of an AT system is to maxi-
mize the overall percentage of correct decisions. The performance of paramet-
ric AT systems can be computed using the area under the ROC curve (see
[1]). Although the performance of non-parametric AT systems can be mea-
sured using accuracy it is not always a good measure, particularly, if true nega-
tives abound. We co ntemplate the following four perform ance measures. Firstly,
E-distance = 1
W ·
(1
− TPF
) 2 +(1
− W
)
· FPF
2 . E-distance is the Eu-
clidean distance from the perfect classifier (point (0
,
1)) and the ROC point of
interest where
is a parameter that ranges between 0 and 1 and establishes
the relative importance between false positives and false negatives. Secondly,
F-measure =
W
( β 2 +1) ·TPF·PPV
β 2 ·PPV + TPF
. F-measure is a combination of recall (TPF) and
precision (PPV) where
β
is a parameter r anging from 0 t o infinity that weights re-
call and precision. Thirdly, G-mean = TPF × TNF
. Geometric mean is high
when both TPF and TNF are hig h and when the differenc e between both is
small. Finally, T-area = 1
2+ s ·
/
(
s − a
)
·
(
s − b
)
·
(
s − c
) f
TPF > FPF
1
/
2
if
TPF
=
FPF
.
s ·
1
/
2
(
s − a
)
·
(
s − b
)
·
(
s − c
) f
TPF < FPF
T-area is the area of the quadrilateral formed by the segments connecting the
ROC point of interest and all the singular points of the ROC space except the
perfect dete ct ion syste m. Using Heron 's formula
= a + b + c
2
s
i.e. half the perime-
= 2,
= TPF
= (1
ter,
) 2 .
An advantage of t-area is that it makes parametric and non-parametric sys-
tems comparable. For instance, we have used the above measures to rank three
AT systems
a
b
2 +
FPF
2 , and
c
− TPF
) 2 +(1
− FPF
a
,
b
, and
c
whose ROC points (
FPF a
=0
.
0620
,TPF a
=0
.
9712),
(
FPF b
=0
.
1034
,TPF b
=0
.
9811), (
FPF c
=0
.
1298
,TPF c
=0
.
9528) are de-
picted in Fig. 1. Clearly, the ROC points of
a
and
b
dominate the ROC point
of
. However, there is no
consensus among the different accuracy measures to signal
c
(see [1]). All measures discern between
a
&
b
and
c
as a winner. We
advocate for the use of t-area given that it has an intuitive explanation (ROC
AUC), serves to compare parametric and non parametric systems, and does not
depend on parameters that establish an artificial weight between the errors.
IDSes misdetection costs are asymmetric (i.e. the cost of notifying a SSO
when an alert corresponds to an innocuous attack is really lower compared with
the cost of not adverting the presence of an intruder). Next scenarios allow one
to evaluate an AT system in a cost sensitive way.
a
or
b
1.2 Cost-Based Alert Triage Evaluation
We consider now scenarios where correct decision outcomes have associated a
benefit
B
and incorrect decision outcomes have associated a cost
C
.
B
(
A | A
)
A
represents the benefit obtained for correctly classifying an alert of type
and
C
A | B
) is the cost incurred if an alert of type B was misclassified as be-
ing an alert of class A. These scenarios are valid to test AT systems in simu-
lated environments and to determine their optimal decision threshold. In this
case, an AT system outperforms another if it has a lower expected cost (EC).
(
 
Search WWH ::




Custom Search