Databases Reference
In-Depth Information
a i has value v ij . The larger this value, the more good the clustering does
in terms of classification.
This category utility formula only applies to categorical attributes (if
it didn't, the set v i 1 ,v i 2 ,... would be infinite, and the summation could
not be evaluated by conventional evaluation of a summation). However,
it is easily extended to numeric attributes by assuming their distribution
is normal, with an observed mean µ and standard deviation σ .Usingthe
probability density function yields the logical equivalency.
C l ] 2
Pr [ a i = v ij ] 2 )
( Pr [ a i = v ij |
i
j
)
f ( a i ) 2 da i
C l ) 2 da i
f ( a i |
(6.12)
i
where,
)
f ( a i ) 2 da i =
1
σ il
1
2 π.σ i
1
σ i
C l ) 2 da i
f ( a i |
(6.13)
i
i
note that, if the standard deviation estimate is ever 0, an infinite value
is produced for the real-valued category utility function. To overcome this
potential problem, COBWEB allows one to set the acuity to a value which
is the minimum of the standard deviations. Table 6.4 shown a comparison
of clustering algorithms for intrusion detection. 42
Table 6.4. Comparison of clustering algorithms for intrusion detection { NC: Nearest
Cluster, FFT10: FFT for 10 clusters, FFT50: FFT with 50 clusters, SVM100: SVM
using K-means with 100 clusters } .
Attack/Method
Probe DoS U2R R2L
R R R R R R R R
KDD Cup Winner
0.833
0.006
0.971
0.003
0.123
3E-5
0.084
5E-5
SOM Map
0.643
***
0.951
***
0.229
***
0.113
***
Linear GP
0.857
***
0.967
***
0.013
***
0.093
***
K-Means
0.876
0.026
0.973
0.004
0.298
0.004
0.064
0.001
NC
0.888
0.005
0.971
0.003
0.022
6E-6
0.034
1E-4
COBWEB
0.364
0.059
0.812
0.248
0.0
0.026
0.611
0.03
FFT10
0.28
0.066
1.0
0.0
0.17
0.021
0.611
0.034
FFT50
0.37
0.06
0.812
0.25
0.34
0.017
0.56
0.036
SVM100
0.67
0.05
0.99
0.05
0.0
0.05
0.29
0.05
Search WWH ::




Custom Search