Alert Triage on the ROC - Computer Network Security

Information Technology Reference

In-Depth Information

number of false positives allowable. The goal of an AT system is to maxi-

mize the overall percentage of correct decisions. The performance of paramet-

ric AT systems can be computed using the area under the ROC curve (see

[1]). Although the performance of non-parametric AT systems can be mea-

sured using accuracy it is not always a good measure, particularly, if true nega-

tives abound. We co ntemplate the following four perform ance measures. Firstly,

E-distance = 1

− W ·

− TPF

) 2 +(1

− W

)

· FPF

2 . E-distance is the Eu-

clidean distance from the perfect classifier (point (0

1)) and the ROC point of

interest where

is a parameter that ranges between 0 and 1 and establishes

the relative importance between false positives and false negatives. Secondly,

F-measure =

( β 2 +1) ·TPF·PPV

β 2 ·PPV + TPF

. F-measure is a combination of recall (TPF) and

precision (PPV) where

is a parameter r anging from 0 t o infinity that weights re-

call and precision. Thirdly, G-mean = √ TPF × TNF

. Geometric mean is high

when both TPF and TNF are hig h and when the differenc e between both is

small. Finally, T-area = 1

2+ s ·

(

s − a

)

(

s − b

)

(

s − c

) f

TPF > FPF

TPF

FPF

− s ·

(

s − a

)

(

s − b

)

(

s − c

) f

TPF < FPF

T-area is the area of the quadrilateral formed by the segments connecting the

ROC point of interest and all the singular points of the ROC space except the

perfect dete ct ion syste m. Using Heron 's formula

= a + b + c

i.e. half the perime-

= √ 2,

= √ TPF

= (1

ter,

) 2 .

An advantage of t-area is that it makes parametric and non-parametric sys-

tems comparable. For instance, we have used the above measures to rank three

AT systems

2 +

FPF

2 , and

− TPF

) 2 +(1

− FPF

, and

whose ROC points (

FPF a

0620

,TPF a

9712),

(

FPF b

1034

,TPF b

9811), (

FPF c

1298

,TPF c

9528) are de-

picted in Fig. 1. Clearly, the ROC points of

and

dominate the ROC point

. However, there is no

consensus among the different accuracy measures to signal

(see [1]). All measures discern between

and

as a winner. We

advocate for the use of t-area given that it has an intuitive explanation (ROC

AUC), serves to compare parametric and non parametric systems, and does not

depend on parameters that establish an artificial weight between the errors.

IDSes misdetection costs are asymmetric (i.e. the cost of notifying a SSO

when an alert corresponds to an innocuous attack is really lower compared with

the cost of not adverting the presence of an intruder). Next scenarios allow one

to evaluate an AT system in a cost sensitive way.

1.2 Cost-Based Alert Triage Evaluation

We consider now scenarios where correct decision outcomes have associated a

benefit

and incorrect decision outcomes have associated a cost

(

A | A

)

represents the benefit obtained for correctly classifying an alert of type

and

A | B

) is the cost incurred if an alert of type B was misclassified as be-

ing an alert of class A. These scenarios are valid to test AT systems in simu-

lated environments and to determine their optimal decision threshold. In this

case, an AT system outperforms another if it has a lower expected cost (EC).

(

Computer Network Security

Search WWH ::

Custom Search

Home