Databases Reference
In-Depth Information
with a very low false positive rate. However, intrusions are usually
polymorph, and evolve continuously. Misuse detection fails easily when
facing unknown intrusions. Manually updating the intrusion signatures is
generally infeasible because it is time consuming and laborious. A possible
way is to automatically extract intrusive patterns from history data for
future prediction. Anomaly detection is orthogonal to misuse detection.
It hypothesizes that abnormal behavior is rare and different from normal
behavior. Hence, it builds models for normal behavior and detects anomaly
in observed data by noticing deviations from these models. Clearly, anomaly
detection has the capability of detecting new types of intrusions, and
only requires normal data when building the profiles. However, its major
diculty lies in discovering boundaries between normal and abnormal
behavior, due to the scarcity of abnormal samples in the training phase.
The two detection methods perfectly match the two high-level primary
goals of data mining: prediction and description. Therefore, classification
and regression tasks for prediction are suitable for automatically construct-
ing misuse models, while clusterings, association rules or sequential rules
for description fit the need of establishing a profile for normal behavior.
2.3. The Role of Evolutionary Computation in KDD
Evolutionary algorithms actively engage in the three steps of the KDD
process. In this section, we mainly concentrate on feature selection in the
pre-processing step, and prediction and description tasks in data mining. In
the post-processing step, an EA is often used for resolving contradictions
when several patterns disagree with the output for a given data instance.
An example can be found in Ref. 16.
2.3.1. Feature selection
One of the main obstacles for improving the performance of IDSs is the high
dimensionality of data; for example, there are 41 features in the KDD99
data set. 17 High dimensional data means huge research spaces, hence requires
expensive computation. However, the information in attributes sometimes
overlaps, or is redundant. Feature selection by eliminating useless features can
enhance the accuracy of the detection while speeding up the computation,
thus improving the overall performance of an IDS. Feature selection studies
fall into two categories based on whether or not they perform selection
Search WWH ::




Custom Search