Information Technology Reference
In-Depth Information
Application of the Generic Feature Selection Measure in
Detection of Web Attacks
Hai Thanh Nguyen 1 , Carmen Torrano-Gimenez 2 , Gonzalo Alvarez 2 ,
Slobodan Petrović 1 , and Katrin Franke 1
1 Norwegian Information Security Laboratory,
Gjøvik University College, Norway
{hai.nguyen,katrin.franke,slobodan.petrovic,}@hig.no
2 Instituto de Física Aplicada,
Consejo Superior de Investigaciones Científicas, Madrid, Spain
{carmen.torrano,gonzalo}@iec.csic.es
Abstract. Feature selection for filtering HTTP-traffic in Web application
firewalls (WAFs) is an important task. We focus on the Generic-Feature-
Selection (GeFS) measure [4], which was successfully tested on low-level
package filters, i.e., the KDD CUP'99 dataset. However, the performance of the
GeFS measure in analyzing high-level HTTP-traffic is still unknown. In this
paper we study the GeFS measure for WAFs. We conduct experiments on the
publicly available ECML/PKDD-2007 dataset. Since this dataset does not target
any real Web application, we additionally generate our new CSIC-2010 dataset.
We analyze the statistical properties of both two datasets to provide more in-
sides of their nature and quality. Subsequently, we determine appropriate in-
stances of the GeFS measure for feature selection. We use different classifiers
to test the detection accuracies. The experiments show that we can remove 63%
of irrelevant and redundant features from the original dataset, while reducing
only 0.12% the detection accuracy of WAFs.
Keywords: Web attack detection, Web application firewall, intrusion detection
systems, feature selection, machine learning algorithms.
1 Introduction
Web attacks pose many serious threats to modern Internet. The number of Web at-
tacks is steadily increasing, consequently Web application firewalls (WAFs) [8] need
to be more and more effective. One of the approaches for improving the effectiveness
of WAFs is to apply the feature selection methods. Achieving reduction of the num-
ber of relevant traffic features without negative effect on detection accuracy is a goal
that greatly increases the available processing time of WAFs and reduces the required
system resources. As there exist many feature selection algorithms (see, for example
[2,3]), the question that arises is which ones could be applied in intrusion detection in
general and in Web attack detection in particular. The most of the feature selection
work in intrusion practice is still done manually and the quality of selected features
depends strongly on expert knowledge. For automatic feature selection, the wrapper
 
Search WWH ::




Custom Search