MINING KNOWLEDGE FROM NETWORK INTRUSION DATA USING DATA MINING TECHNIQUES - Knowledge Mining Using Intelligent Agents

Databases Reference

In-Depth Information

describes the observed behavior. Given a hidden Markov model and the

associated observations, the third problem, also known as the decoding

problem, involves determining the most likely set of hidden states that

have led to those observations. The major drawback of using hidden

Markov models in anomaly detection technique is that it is computationally

expensive, because it uses parametric estimation techniques based on the

Bayes algorithm for learning the normal profile of the host/network under

consideration.

6.6. Ensemble of Classifier

Ensembles of classifiers are often better than any individual classifier. This

can be attributed to three key factors. 30

Statistical:

Machine learning algorithms attempt to construct a hypo-

thesis that best approximates the unknown function that describes the data,

based on the training examples provided. Insucient training data can

lead an algorithm to generate several hypotheses of equal performance.

By taking all of the candidate hypotheses and combining them to form

an ensemble, their votes are averaged and the risk of selecting incorrect

hypotheses is reduced.

Computational:

Many machine learning approaches are not guaranteed

to find the optimal hypotheses; rather, they perform some kind of local

search which may find local minima (rather than the global minimum,

or the optimal hypothesis). For example, decision trees and ANNs

can often produce sub-optimal solutions. By starting the local search

in different locations, an ensemble can be formed by combining the

resulting hypotheses; therefore, the resulting ensemble can provide a better

approximation of the true underlying function.

Representational:

The statistical and computational factors allow

ensembles to locate better approximations of the true hypothesis that

describes the data; however, this factor allows ensembles to expand the

space of representable functions beyond that achievable by any individual

classifier. It is possible that the true function may not be able to be

represented by an individual machine learning algorithm, but a weighted

sum of the hypotheses within the ensemble may extend the space of

representable hypotheses to allow a more accurate representation.

Search WWH ::

Custom Search

Home