Biomedical Engineering Reference
In-Depth Information
matching degree of training data d e D train (k) with r e R k , where, D train (k) is the set
of suffixes of training data in class k. K = 2 is used in this paper.
A. Building a Classifier
After creating the probability density function f K (x k ) of the average matching
degree between training data d e D train (k) and rule r e R k , the probability that new
connection data d e D test belongs to class k is represented as follows:
P k ðÞ¼ Z 1 : 0
mk ðÞ
f K x ð dx k ... X k 2 C f K x ð dx K ... Z 1 : 0
ml ðÞ
f 1 x ð dx 1 ;
ð 7 Þ
where, D test is the set of suffix of testing data. Actually, the probability that d
Sigma D test belongs to anomaly class is defined as:
P 0 ðÞ¼ X
k 2 C
1 P k ðÞ
ð 8 Þ
where, C is the set of suffix of classes having training data. In the case of two
classes, the probabilities of the first class and the second class can be calculated by
the following equations.
P 1 ðÞ¼ Z 1 : 0
m2 ðÞ
f 2 x ð dx 2 Z m1 ðÞ
0
f 1 x ð dx 1
ð 9 Þ
P 2 ðÞ¼ Z m2 ðÞ
0
f 2 x ð dx 2 Z 1 : 0
m1 ðÞ
f 1 x ð dx 1
ð 10 Þ
Then, the probability that a new connection data belongs to anomaly class is
calculated
P 0 ðÞ¼ 1 P k 2 C P k ðÞ .
by
Based
on
the
calculation
of
these
probabilities, d is assigned to the class with highest probability.
KNN (Known Nearest Neighbor)
KNN is a nonparametric lazy learning algorithm. That is a pretty concise
statement. When you say a technique is nonparametric, it means that it does not
make any assumptions on the underlying data distribution. This is pretty useful, as
in the real world, most of the practical data do not obey the typical theoretical
assumptions made (e.g Gaussian mixtures, linearly separable, etc.). Nonparametric
algorithms like KNN come to the rescue here.
It is also a lazy algorithm. This means that it does not use the training data
points to do any generalization. In other words, there is no explicit training phase,
or it is very minimal. This means the training phase is pretty fast. Lack of gen-
eralization means that KNN keeps all the training data. More precisely, all the
training data is needed during the testing phase. (Well this is an exaggeration, but
Search WWH ::




Custom Search