Efficient Intrusion Detection with KNN Classification and DS Theory - Proceedings of All India Seminar on Biomedical Engineering 2012 (AISOBE-2012)

Biomedical Engineering Reference

In-Depth Information

matching degree of training data d e D train (k) with r e R k , where, D train (k) is the set

of suffixes of training data in class k. K = 2 is used in this paper.

A. Building a Classifier

After creating the probability density function f K (x k ) of the average matching

degree between training data d e D train (k) and rule r e R k , the probability that new

connection data d e D test belongs to class k is represented as follows:

P k ðÞ¼ Z 1 : 0

mk ðÞ

f K x ð dx k ... X k 2 C f K x ð dx K ... Z 1 : 0

ml ðÞ

f 1 x ð dx 1 ;

ð 7 Þ

where, D test is the set of suffix of testing data. Actually, the probability that d

Sigma D test belongs to anomaly class is defined as:

P 0 ðÞ¼ X

k 2 C

1 P k ðÞ

ð 8 Þ

where, C is the set of suffix of classes having training data. In the case of two

classes, the probabilities of the first class and the second class can be calculated by

the following equations.

P 1 ðÞ¼ Z 1 : 0

m2 ðÞ

f 2 x ð dx 2 Z m1 ðÞ

f 1 x ð dx 1

ð 9 Þ

P 2 ðÞ¼ Z m2 ðÞ

f 2 x ð dx 2 Z 1 : 0

m1 ðÞ

f 1 x ð dx 1

ð 10 Þ

Then, the probability that a new connection data belongs to anomaly class is

calculated

P 0 ðÞ¼ 1 P k 2 C P k ðÞ .

Based

the

calculation

these

probabilities, d is assigned to the class with highest probability.

KNN (Known Nearest Neighbor)

KNN is a nonparametric lazy learning algorithm. That is a pretty concise

statement. When you say a technique is nonparametric, it means that it does not

make any assumptions on the underlying data distribution. This is pretty useful, as

in the real world, most of the practical data do not obey the typical theoretical

assumptions made (e.g Gaussian mixtures, linearly separable, etc.). Nonparametric

algorithms like KNN come to the rescue here.

It is also a lazy algorithm. This means that it does not use the training data

points to do any generalization. In other words, there is no explicit training phase,

or it is very minimal. This means the training phase is pretty fast. Lack of gen-

eralization means that KNN keeps all the training data. More precisely, all the

training data is needed during the testing phase. (Well this is an exaggeration, but

Proceedings of All India Seminar on Biomedical Engineering 2012 (AISOBE-2012)

Search WWH ::

Custom Search

Home