Databases Reference
In-Depth Information
Equation (6.8) states that each prototype is turned towards the sample
vector x i according to the neighborhood function h cj ( t ). Most commonly
used neighborhood function is the Gaussian neighborhood.
exp
n j 2
2 σ 2 ( t )
n c
h cj ( t )= α ( t )
·
(6.10)
is the
topological distance between prototypes c and j in the SOM grid. The
factors α ( t )and σ ( t ) are the learning rate factor and the neighborhood
width factor, respectively. Both of these factors decrease monotonically as
training proceeds.
The neighborhood is centered on the BMU. Norm
n c
n j
COBWEB Algorithm for
Whereas
iterative distance-based clustering, such as K-Means, iterate over the whole
dataset until convergence in the clusters is reached, COBWEB works
incrementally, updating the clustering instance by instance. The clustering
COBWEB creates is expressed in the form of a tree, with leaves representing
each instance in the tree, the root node representing the entire dataset,
and branches representing all the clusters and sub clusters within the tree.
In fact, that there is no limit to the total number of sub clusters except
the limit imposed by the size of the dataset. COBWEB starts with a tree
consisting of just the root node. From there, instances are added one by
one, with the tree updated appropriately at each stage. When an instance is
added, one of the four possible actions is taken: The option with the greatest
category utility is chosen. Category utility is defined by the function:
CU ( C 1 ,C 2 ,...,C k )= l Pr [ C l ] i j ( Pr [ a i = v ij |
Incremental Clustering:
C l ] 2
Pr [ a i = v ij ] 2 )
k
(6.11)
where C 1 ,C 2 ,...,C k are the k clusters; the outer summation is over each
of the clusters, which is later divided by k to provide a “per cluster” figure;
the next inner summation sums over the attributes, and the inner-most
summations sums over the possible values; a i is the i th attribute, and it
takes on values which are dealt with by the sum over j . Pr [ A ] refers to
the probability of event A occurring and Pr [ A
B ] refers to the probability
of event A occurring conditional on event B. Thus the difference ( Pr [ a i =
v ij |
|
Pr [ a i = v ij ] 2 ) refers to the difference between the probability
that a i has value v ij for an instance in cluster C l and the probability that
C l ] 2
Search WWH ::




Custom Search