Information Technology Reference
In-Depth Information
Table 3. Experimental results on the CSIC 2010 dataset
Classifiers
Detection rate
False Positive Rate
Full-set
CFS
mRMR
Full-set
CFS
mRMR
C4.5
94.49
94.06
79.80
5.9
6.8
25.7
CART
94.12
93.71
79.85
6.2
6.8
25.3
Random Tree
92.30
92.70
71.36
8.3
7.8
30.6
Random Forest
93.71
93.68
71.70
7.2
7.2
30.5
Average
93.65
93.53
75.67
6.9
7.1
28
Table 4. Experimental results on the ECML/PKDD 2007 dataset
Classifiers
Detection rate
False Positive Rate
Full-set
CFS
mRMR
Full-set
CFS
mRMR
C4.5
96.37
86.45
91.62
3.7
17.6
9.9
CART
96.11
86.45
91.54
4.3
17.6
10
Random Tree
96.89
86.39
93.41
2.6
17.7
6.4
Random Forest
98.80
86.39
95.18
1.2
17.7
5.0
Average
97.04
86.42
92.93
2.95
17.6
7.8
It can be observed from Table 2 and Table 3 that the CFS measure performed well
on the CSIC 2010 dataset and gave better results than the mRMR measure. In fact, the
CFS measure removed the number of irrelevant and redundant features from the data-
set by more than 63%, while reducing very slightly (only 0.12%) the detection accu-
racy. In this case, the mRMR measure gave much worse results in comparison with
the full-set features.
From Table 2 and Table 4, it can be seen that the mRMR measure removed 80% of
irrelevant and redundant features from the ECML/PKDD 2007 dataset, whereas the
detection accuracies were a bit lower than the ones obtained with the full-set feature.
The CFS measure did not work well in this case.
Therefore, based on all these experiments we can say that the effectiveness of
WAFs would be improved by choosing and using appropriate feature selection meth-
ods of the GeFS measure.
4 Conclusions
We have proposed to use the generic feature selection (GeFS) measure for Web attack
detection. We analyzed statistical properties of the new generated CSIC 2010 dataset
and the ECML/PKDD 2007 dataset. Based on this analysis, the CFS measure and the
mRMR measure were chosen for selecting features from the CSIC 2010 dataset and
the ECML/PKDD 2007 dataset, respectively. The detection accuracies obtained after
the feature selection by means of four different classifiers were tested. The experi-
ments show that by choosing appropriate instances of the GeFS measure, we could
remove 63% of irrelevant and redundant features from the original dataset, while
reducing only 0.12% the detection accuracy of WAFs.
Search WWH ::




Custom Search