Information Technology Reference
In-Depth Information
Time is another fascinating issue of great importance when considering intrusions
since the chance of detecting an attack increases in relation to its duration. There are
therefore two main strategies:
Drastically reduce the time used to perform a scan.
Spread the packets out over time, which is to say, reduce the number of packets
sent per time unit that are likely to slip by unnoticed.
In this study, the mutations are applied to data related to network scans. It should be
taken into account that any of the possible mutations may be meaningless such as a
sweep of less than 5 hosts in the case of a network scan.
Changes can be made to attack packets taking the following issues into account:
Number of scans in the attack (that is, number of addressed port numbers).
Destination port numbers at which scans are aimed.
Time intervals when scans are performed.
Number of packets (density) forming the scans (number of scanned hosts).
3 Classifiers and Ensembles
As previously explained, one of the most interesting features of IDSs would be their
capability to automatically detect whether a portion of the traffic circulating the net-
work is an attack or normal traffic. Automated learning techniques are algorithms
designed specifically for the purpose of deciding about new presented data.
Usually that kind of algorithms suffer from common problems, such as the
over-fitting to the data used for training - and therefore, poor generalization capabili-
ties -, the stuck on local minima in their learning function or a high computational
complexity when dealing with complex data. One of the most widespread and useful
techniques in order to avoid such problems is the ensemble learning scheme [19],
[20]. The main idea behind this kind of meta-algorithms is to train several slightly
different simpler classifiers and combine their results in order to improve the results
obtained by a single, usually more complex, one [21].
In the present study several of these algorithms have been considered both for the
base classifiers and for the ensemble training in order to have a significant wide array
of possible algorithms to compare their performance results on mutated data sets.
Among the base classifiers, it should be mentioned clustering algorithms such as the
k-Nearest Neighbours (IBK) [22], instance-based statistical classification algorithms such
as the Simple Classification and Regression Decision Tree (CART) [23] and the REP-
Tree [24] and artificial neural-network such as the Radial Basis Function Network [25].
Among the ensemble meta-algorithms that make use of the previous mentioned
simple algorithms, the test performed has made use of basic algorithms such as the
MultiClass Classifier [26], used to adapt binary classifiers to multi-class problems,
Bagging [27], Adaptative Boosting (AdaBoost) [28], or Random Forest [29] and
compared their results with more modern boosting algorithms such as the LogitBoost
[30] or the StackingC [31]. As results prove, ensemble learning adds an important
value to the analysis, as almost all variants consistently improve results obtained by
the single classifier.
Search WWH ::




Custom Search