Relating Subjective and Objective Pharmacovigilance Association Measures - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

clustering and the results obtained for the original dataset are compared, for each

value of k , with the quality measures computed from the m randomizations. A

good clustering is one whose quality measure lies significantly above the range of

the quality measures computed from the randomized results, for the same value

of k .

The silhouette coefficient used here as a measure of cluster quality is based on

the idea that a good clustering should consist of cohesive, well-separated clusters.

Given a partitioning

of N objects into k clusters, consider any fixed object i

and let C i denote the set of indices for all objects clustered together with object i .

A useful measure of cohesion for this cluster is:

a ( i )= 1

n i

d ij ,

(15.6)

∈

C i

where n i is the number of objects in cluster C i and d ij is the dissimilarity between

objects i and j . Note that for a good clustering, a ( i ) should be small for all i .To

characterize the separation between clusters, let K denote the th neighboring

cluster, distinct from C i ,for =1 , 2 ,...,k

1.Define b ( i ) as the average dis-

similarity between object i in cluster C i and the objects in the closest neighboring

cluster, given by:

−

b ( i )=min

d ij

(15.7)

j∈K

Here, for a good clustering, b ( i ) should be large for all i . The silhouette coefficient

s ( i ) for object i is then defined as the following normalized difference between

these two quantities:

b ( i )

−

a ( i )

s ( i )=

(15.8)

max

{

a ( i ) ,b ( i )

}

For s ( i ) to be well-defined, the partitioning

must contain at least two clusters,

and every cluster must contain at least two objects. Under these conditions, it is

easily shown that

1 for all i . From the previous observations, a

good clustering should have a ( i ) << b ( i ) for all i , implying s ( i )

−

≤

s ( i )

≤

1 for all i .A

useful measure of the overall quality of the partitioning

is therefore the average

silhouette coefficient over all objects:

S = 1

s ( i ) .

(15.9)

i =1

Given this cluster quality measure, let

D 0 denote the original dataset and let S 0 ( k )

denote the value of S computed for the k -cluster partitioning of

D 0 obtained by a

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home