Digging into IP Flow Records with a Visual Kernel Method - Computational Intelligence in Security for Information Systems

Information Technology Reference

In-Depth Information

Time is another fascinating issue of great importance when considering intrusions

since the chance of detecting an attack increases in relation to its duration. There are

therefore two main strategies:

−

Drastically reduce the time used to perform a scan.

−

Spread the packets out over time, which is to say, reduce the number of packets

sent per time unit that are likely to slip by unnoticed.

In this study, the mutations are applied to data related to network scans. It should be

taken into account that any of the possible mutations may be meaningless such as a

sweep of less than 5 hosts in the case of a network scan.

Changes can be made to attack packets taking the following issues into account:

−

Number of scans in the attack (that is, number of addressed port numbers).

−

Destination port numbers at which scans are aimed.

−

Time intervals when scans are performed.

−

Number of packets (density) forming the scans (number of scanned hosts).

3 Classifiers and Ensembles

As previously explained, one of the most interesting features of IDSs would be their

capability to automatically detect whether a portion of the traffic circulating the net-

work is an attack or normal traffic. Automated learning techniques are algorithms

designed specifically for the purpose of deciding about new presented data.

Usually that kind of algorithms suffer from common problems, such as the

over-fitting to the data used for training - and therefore, poor generalization capabili-

ties -, the stuck on local minima in their learning function or a high computational

complexity when dealing with complex data. One of the most widespread and useful

techniques in order to avoid such problems is the ensemble learning scheme [19],

[20]. The main idea behind this kind of meta-algorithms is to train several slightly

different simpler classifiers and combine their results in order to improve the results

obtained by a single, usually more complex, one [21].

In the present study several of these algorithms have been considered both for the

base classifiers and for the ensemble training in order to have a significant wide array

of possible algorithms to compare their performance results on mutated data sets.

Among the base classifiers, it should be mentioned clustering algorithms such as the

k-Nearest Neighbours (IBK) [22], instance-based statistical classification algorithms such

as the Simple Classification and Regression Decision Tree (CART) [23] and the REP-

Tree [24] and artificial neural-network such as the Radial Basis Function Network [25].

Among the ensemble meta-algorithms that make use of the previous mentioned

simple algorithms, the test performed has made use of basic algorithms such as the

MultiClass Classifier [26], used to adapt binary classifiers to multi-class problems,

Bagging [27], Adaptative Boosting (AdaBoost) [28], or Random Forest [29] and

compared their results with more modern boosting algorithms such as the LogitBoost

[30] or the StackingC [31]. As results prove, ensemble learning adds an important

value to the analysis, as almost all variants consistently improve results obtained by

the single classifier.

Computational Intelligence in Security for Information Systems

Search WWH ::

Custom Search

Home