MINING KNOWLEDGE FROM NETWORK INTRUSION DATA USING DATA MINING TECHNIQUES - Knowledge Mining Using Intelligent Agents

Databases Reference

In-Depth Information

with centroids 10 and 65. Ok, these might be candidates for “Old” and

“Young”. But, if 60 and 70 are selected as initial cluster centroids, vector

10 will be grouped together with 60 and we end up with two clusters

with centroids 35 and 70 which might be a less optimal definition. The

main advantage of the k -means algorithm is its simplicity and speed, a

good feature if and IDS want to use clustering techniques in real-time.

Also, its complexity increases in a linear matter with an increase in the

number of features used. Other algorithm exists and these too could be

candidates for automatic clustering, like; (i) The Fuzzy C-means algorithm,

(ii) Hierarchical clustering, (iii) Mixture of Gaussians.

Fuzzy c -Means (FCM) algorithm,

also known as fuzzy ISODATA, was introduced by Bezdek 39 as extension

to Dunn's algorithm to generate fuzzy sets for every observed feature.

The fuzzy c -means clustering algorithm is based on the minimization of

an objective function called c -means functional. Fuzzy c -means algorithm

is one of the well known relational clustering algorithms. It partitions the

sample data for each explanatory (input) variable into a number of clusters.

These clusters have “fuzzy” boundaries, in the sense that each data value

belongs to each cluster to some degree or other. Membership is not certain,

or “crisp”. Having decided upon the number of such clusters to be used,

some procedure is then needed to location their centers (or more generally,

mid-points) and to determine the associated membership functions and the

degree of membership for the data points. Fuzzy clustering methods allow

for uncertainty in the cluster assignments. FCM is an iterative algorithm

to find cluster centers (centroids) that minimize a dissimilarity function.

Rather that partitioning the data into a collection of distinct sets by fuzzy

partitioning, the membership matrix (U) is randomly initialized according

to Equation (6.2).

Fuzzy c -Means (FCM) Clustering:

c

u ij =1 ,

∀

j =1 , 2 , 3 ,...,n.

(6.4)

i =1

The dis-similarity function (or more generally the objective function),

which is used in FCM in given Equation (6.3).

c

n

u ij d ij ,

J ( U, c 1 ,c 2 ,...,c c )=

J i =

(6.5)

i =1

j =1

Knowledge Mining Using Intelligent Agents

Search WWH ::

Custom Search

Home