Databases Reference
In-Depth Information
recognition and intrusion detection. In case of an anomaly detection scheme,
PCA was used as an outlier detection scheme and was applied to reduce
the dimensionality of the audit data and arrive at a classifier that is a
function of the principal components. They measured the mahalanobis
distance of each observation from the centre of the data for anomaly
detection. The mahalanobis distance is computed based on the sum of
squares of the standardized principal component scores. In Shyu et al. , 29
the authors have evaluated these methods over KDDcup1999 data and have
demonstrated that it exhibits better detection rate than other well known
outlier based anomaly detection algorithms such as the local outlier factor
“LOF” approach, the Nearest Neighbour approach and the k th Nearest
Neighbour approach.
Markov models: A hidden markov model is a statistical model, where
the system being modeled is assumed to be a Markov process with unknown
parameters. The challenge is to determine the hidden parameters from the
observable parameters. Unlike a regular Markov model, where the state
transition probabilities are the only parameters and the state of the system
is directly observable, in a hidden Markov model, the only visible elements
are the variables of the system that are influenced by the state of the system,
and the state of the system itself is hidden. A hidden Markov model's state
represents some unobservable condition of the system being modeled. In
each state, there is a certain probability of producing any of the observable
system outputs and a separate probability indicating the likely next states.
By having different output probability distributions in each of the state,
and allowing the system to change states over time, the model is capable
of representing non-stationary sequences. To estimate the parameters of a
hidden Markov model for modeling normal system behaviour, sequences of
normal events collected from normal system operation are used as training
data. An expectation-maximization (EM) algorithm is used to estimate
the parameters. Once a hidden Markov model has been trained, when
confronted with test data, probability measures can be used as thresholds
for anomaly detection. In order to use hidden Markov models for anomaly
detection, three key problems need to be addressed. The first problem,
also known as the evaluation problem, is to determine given a sequence
of observations, what is the probability that the observed sequence was
generated by the model. The second is the learning problem which involves
building from the audit data, a model or set of models, that correctly
Search WWH ::




Custom Search