Biology Reference
In-Depth Information
in a cluster to itself. Past approached do not deal with the “hub” and “outlier”
problems very well. That is, they cannot split two clusters very well that are
densely connected by one or more “hub” vertices, and they often fail to detect and
isolate outliers.
In this paper, we propose a similarity-based graph partitioning definition that
globally measures whether one partitioning is better than another. Through the-
oretical analysis and experimental results, we demonstrate that the proposed def-
inition outperforms existing approaches on complicated graphs and handles with
agility hubs and outliers. Further, we show that this novel definition can be used
with the fast agglomerative algorithm [1, 13] to find communities in very large
networks.
The rest of this paper is organized as follows: in section 2 we brieflyreview
related works; in section 3 we propose our novel similarity-based modularity mea-
surement; in section 4 and 5 we discuss two techniques for finding optimal graph
partitions by maximizing the modularity; in section 6 we apply them to network
with known structures and present experimental results; and finally in section 7
we summarize our conclusions.
11.2. Related Work
The most traditional definition of graph partitioning is probably the min-max
cut [3, 8, 16], which seeks to partition a graph G =
into two sub graphs
A and B . The principle of min-max clustering is minimizing the number of edges
between clusters A and B and maximizing the number of edges within each. In-
stead of number of edges, sum of weights is used for a weighted graph. Thus the
connection between A and B the cut:
cut ( A,B )= W ( A,B )=
u
{
V,E
}
w ( u,v ) .
(11.1)
A,v
B
W ( A,A ).
A bi-partition of the graph is defined as minimizing the following objective
function:
It is obvious that W ( A )
M cut = cut ( A,B )
W ( A )
+ cut ( A,B )
W ( B )
.
(11.2)
The above function is called the min-max cut function. It minimizes the cut
between two clusters while maximizing the connections within a cluster. How-
ever, a pitfall of this definition is that if we only cut out one node from a graph,
M cut will probably get the minimum value. So, in practice, M cut must be accom-
panied with some constraints, such as A and B should of equal or similar size, or
Search WWH ::




Custom Search