Security Oracle Based on Tree Kernel Methods - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

so we decided to keep both. We called PTK(p) and uPTK(p) the configurations

with higher precision and PTK(r) and uPTK(r) those with higher recall.

After completing the learning phase, performances of the security oracle have

been assessed on the assessment data set.

Tabl e 3. Experimental results for Yapig

Kernel

Optimal Cost-factor Precision Recall F-measure

SK

1

9% 100%

17%

STK

21

12% 100%

22%

SSTK

9

9% 100%

17%

SSTK + BOW

7

9% 100%

17%

PTK(p)

6

100% 27%

42%

PTK(r)

11

21% 100%

34%

uPTK(p)

4

100% 27%

42%

uPTK(r)

11

21% 100%

34%

Table 3 reports experimental results collected on Yapig for the 6 different

kernel methods. The first column reports the kernel, the second column the

optimal cost-factor value, while third, fourth and fifth columns report precision,

recall and F-measure.

The best results in terms of F-measure (42%) have been achieved by two

methods, PTK(p) and uPTK(p). By running these methods, reported precision

and recall have been 100% and 27% respectively, meaning that all the test cases

that have been classified as attacks (i.e., 7) are real attacks, while 19 attacks

have been classified as safe tests, which represents a fairly bad result since the

primary objective of the oracle should be not to miss attacks. Furthermore, any

attack misclassification might have severe consequences in terms of security and

should be avoided.

The other two variants of these same methods, PTK(r) and uPTK(r), regis-

tered the second best result with F-measure of 34%, low precision (21%) but high

recall (100%). This result, despite the low precision that indicates the presence

of false positives (100 out of 286 tests), is preferable in the domain of secu-

rity testing since no false negatives have been found, i.e. no attacks have been

misclassified as safe tests.

Among the remaining four kernel methods, the best results have been obtained

by STK. All the attacks have been classified in the correct way, recording high

recall (100%) but low precision (12%). SK, SSTK and SSTK + BOW methods

performed slightly worse, obtaining equal recall (100%) but even lower precision

(9%). In fact, all the tests in the assessment have been classified as attacks,

generating 260 false alarms that would have required manual inspection by the

developers.

The high recall on Yapig (100%) can be justified by the fact that, differently

from the previous experiment, the training set is much more representative of

the attacks. Safe tests are also well represented, since the PHP code tested is

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home