Information Technology Reference
In-Depth Information
so we decided to keep both. We called PTK(p) and uPTK(p) the configurations
with higher precision and PTK(r) and uPTK(r) those with higher recall.
After completing the learning phase, performances of the security oracle have
been assessed on the assessment data set.
Tabl e 3. Experimental results for Yapig
Kernel
Optimal Cost-factor Precision Recall F-measure
SK
1
9% 100%
17%
STK
21
12% 100%
22%
SSTK
9
9% 100%
17%
SSTK + BOW
7
9% 100%
17%
PTK(p)
6
100% 27%
42%
PTK(r)
11
21% 100%
34%
uPTK(p)
4
100% 27%
42%
uPTK(r)
11
21% 100%
34%
Table 3 reports experimental results collected on Yapig for the 6 different
kernel methods. The first column reports the kernel, the second column the
optimal cost-factor value, while third, fourth and fifth columns report precision,
recall and F-measure.
The best results in terms of F-measure (42%) have been achieved by two
methods, PTK(p) and uPTK(p). By running these methods, reported precision
and recall have been 100% and 27% respectively, meaning that all the test cases
that have been classified as attacks (i.e., 7) are real attacks, while 19 attacks
have been classified as safe tests, which represents a fairly bad result since the
primary objective of the oracle should be not to miss attacks. Furthermore, any
attack misclassification might have severe consequences in terms of security and
should be avoided.
The other two variants of these same methods, PTK(r) and uPTK(r), regis-
tered the second best result with F-measure of 34%, low precision (21%) but high
recall (100%). This result, despite the low precision that indicates the presence
of false positives (100 out of 286 tests), is preferable in the domain of secu-
rity testing since no false negatives have been found, i.e. no attacks have been
misclassified as safe tests.
Among the remaining four kernel methods, the best results have been obtained
by STK. All the attacks have been classified in the correct way, recording high
recall (100%) but low precision (12%). SK, SSTK and SSTK + BOW methods
performed slightly worse, obtaining equal recall (100%) but even lower precision
(9%). In fact, all the tests in the assessment have been classified as attacks,
generating 260 false alarms that would have required manual inspection by the
developers.
The high recall on Yapig (100%) can be justified by the fact that, differently
from the previous experiment, the training set is much more representative of
the attacks. Safe tests are also well represented, since the PHP code tested is
 
Search WWH ::




Custom Search