Information Technology Reference
In-Depth Information
input data. The remaining objects are re-connected and labeled. A granule is determined
by the edges between the objects in the structure. The components of the same granule
(group) have equal labels.
The last step is to apply a procedure of adjusting of all network objects, where values
of attributes of some objects are slightly modified (depending on a similarity level of
objects and the type of attributes). This procedure allows to adjust network objects in
the attained solution to the examined problem.
It must be emphasized, that algorithm SOSIG does not require a number of clusters
to be given. On contrary to partitioning and hierarchical methods groups are identified
automatically, which eliminates the inconvenient step of assessing and selecting the best
result from a set of potential clusterings.
4
Clustering Validation
Together with specification of elementary granules it is necessary to define measures of
granule quality [12]. The aim of clustering techniques is detecting of granules, which
are possibly the most compact and separable. To evaluate compactness and separability
of discovered clusters there are proposed statistics, so-called internal validity indices.
Validity indices are designed to estimate the quality of obtained partitioning. Assess-
ment of the most optimal result requires calculation of validity indices for different
values of algorithm parameters, which usually is a number of clusters. The most com-
monly used indices are Dunn and Dunn-like statistics and Davies-Bouldin (DB) index
[3]. Their advantage is indicating no trends with respect to the number of clusters.
Therefore, the minimum (DB) or maximum (Dunn) value indicates the most optimal
partition. The Dunn's value for specified number of granules nc is defined by Equation
6. Let U be a set of objects and let C i beacluster,where i =1 ,...,nc.
d ( C i ,C j )
max k =1 ,...,nc diam ( C k )
D nc =min
i =1 ,...,nc
min
j = i +1 ,...,nc
(6)
where d ( C i ,C j ) is the dissimilarity function between two clusters C i and C j defined
as
d ( C i ,C j )= min
x∈C i ,y∈C j
d ( x, y )
(7)
and diam ( C ) is a diameter of a cluster defined as follows:
diam ( C )= max
x,y∈C d ( x, y )
(8)
Following the above definition the index value is large for compact clusters situated
significantly far from one another. DB index is expressed by Equation 9. It is defined
for the number of clusters, which equals nc .
j =1 ,...,nc,j = i R ij
nc
1
nc
DB nc =
max
(9)
i =1
 
Search WWH ::




Custom Search