Biology Reference
In-Depth Information
2. Considering merging every possible pair of cluster c g and c h ,( g
= h ) and note
the change to Q S , which is given by:
Q S g + h = Q S g + h
Q S g
Q S h
(11.10)
3. Choose the merge producing the largest ∆ Q S g + h , then update and record Q S .
4. Update all affected values of ∆ Q S g + i
and ∆ Q S h + j , ( i
= g ; j
= h ) using
formula (11.10).
5. Repeat step 3 and 4 until all vertices are grouped into one cluster.
6. Retrieve the partitioning with the highest Q S value as the best result.
In step 4, only clusters with edges connecting them to the merged pair need to
be updated. The number of clusters to be updated that are connected to cluster g is
|
updates,
each update taking O(log k ) < O(log n ) time to access the data structure, where
n is the number of vertices in the Graph. Thus each iteration takes O((
i
|
and the number connected to cluster h is
|
j
|
. There are at most
|
i
|
+
|
j
|
|
i
|
|
j
|
+
)log
n ) time [1].
The overall running time is the sum of all joining steps, which, as Newman
indicated is at most O( md log n ) [13], where m is number of edges and d is the
depth of the dendrogram. More details of this algorithm may be found in Newman
et al. [1, 13].
11.6. Evaluation Results
In this section, we present experimental result from analyzing synthetic graphs,
detecting community structure in social networks, and clustering entities from a
customer database. The size of the graphs varies from tiny to very large. Each
example has a known structure that we are seeking to reproduce. We introduce a
measure of “error” to gauge the performance of different clustering algorithms: if
a node is classified into a cluster other than the pre-defined cluster, we count it as
an error.
11.6.1. Tests on Synthetic Graphs
Synthetic graphs are generated based on predefined cluster structure and proper-
ties, so as to facilitate systematic study of the performance of our similarity-based
modularity definition and graph partitioning algorithms. Here we use the same
construction as used in several papers [6, 7, 13]. The synthetic graph consists of
128 vertices that are divided into 4 equal-size clusters. Each vertex connects to
vertices within its cluster with a probability P i , and connects to vertices outside
its cluster with a probability P o <P i . On average, each vertex is connected to
Search WWH ::




Custom Search