MINING KNOWLEDGE FROM NETWORK INTRUSION DATA USING DATA MINING TECHNIQUES - Knowledge Mining Using Intelligent Agents

Databases Reference

In-Depth Information

recognition and intrusion detection. In case of an anomaly detection scheme,

PCA was used as an outlier detection scheme and was applied to reduce

the dimensionality of the audit data and arrive at a classifier that is a

function of the principal components. They measured the mahalanobis

distance of each observation from the centre of the data for anomaly

detection. The mahalanobis distance is computed based on the sum of

squares of the standardized principal component scores. In Shyu et al. , 29

the authors have evaluated these methods over KDDcup1999 data and have

demonstrated that it exhibits better detection rate than other well known

outlier based anomaly detection algorithms such as the local outlier factor

“LOF” approach, the Nearest Neighbour approach and the k th Nearest

Neighbour approach.

Markov models: A hidden markov model is a statistical model, where

the system being modeled is assumed to be a Markov process with unknown

parameters. The challenge is to determine the hidden parameters from the

observable parameters. Unlike a regular Markov model, where the state

transition probabilities are the only parameters and the state of the system

is directly observable, in a hidden Markov model, the only visible elements

are the variables of the system that are influenced by the state of the system,

and the state of the system itself is hidden. A hidden Markov model's state

represents some unobservable condition of the system being modeled. In

each state, there is a certain probability of producing any of the observable

system outputs and a separate probability indicating the likely next states.

By having different output probability distributions in each of the state,

and allowing the system to change states over time, the model is capable

of representing non-stationary sequences. To estimate the parameters of a

hidden Markov model for modeling normal system behaviour, sequences of

normal events collected from normal system operation are used as training

data. An expectation-maximization (EM) algorithm is used to estimate

the parameters. Once a hidden Markov model has been trained, when

confronted with test data, probability measures can be used as thresholds

for anomaly detection. In order to use hidden Markov models for anomaly

detection, three key problems need to be addressed. The first problem,

also known as the evaluation problem, is to determine given a sequence

of observations, what is the probability that the observed sequence was

generated by the model. The second is the learning problem which involves

building from the audit data, a model or set of models, that correctly

Search WWH ::

Custom Search

Home