Biology Reference
In-Depth Information
|
. These constraints are not always applicable for all applications, e.g.,
in clustering problems where some communities are much larger than the others.
To amend the above issue, a
normalized cut
(
N
cut
) was proposed [16]:
A
|≈|
B
|
cut
(
A,B
)
assoc
(
A,V
)
+
cut
(
A,B
)
assoc
(
B,V
)
,
N
cut
=
(11.3)
where
assoc
(
A,V
)=
u∈A,v∈V
w
(
u,v
) is the total connection from vertices in
A
to all vertices in the graph. Using the
N
cut
, cutting out one vertex or some small
part of the graph will no longer always yield a small
N
cut
value.
Using the
M
cut
or
N
cut
one can partition a graph into two sub-graphs. As the
natural consequence, to divide a graph into
k
sub-graphs, one has to adopt a top-
down approach, i.e., splitting the graph into two sub-graphs, then further splitting
these sub-graphs into next level sub-graphs; repeat this process until
k
sub-graphs
have been detected. The disadvantage [5] of this method is that, like all other
bisection algorithms, it only partitions a graph into two sub-graphs. Though one
can repeat this procedure on the sub-graphs, there is no guarantee of the optimality
of partitioning, and one has no clue on when to stop the repeated the bisection
procedure or how many sub-graphs should be produced in a graph.
To answer the question “what is the best graph partition”, other researchers
proposed global measurements, such as M. J. Newman's modularity [1, 7, 13]:
l
s
L
−
2
,
d
s
L
NC
Q
N
=
(11.4)
i
=1
where
NC
is the number of clusters,
L
is the number of edges in the network,
l
s
is the number of edges between vertices within cluster
s
,and
d
s
is the sum of the
degrees of vertices in cluster
s
. The optimal clustering is achieved by maximizing
the modularity
Q
N
. Modularity is defined such that
Q
N
is 0 at the extremes of all
vertices are clustered into a single cluster or of vertices are randomly clustered. As
this modularity definition requires no constraints, it is better than the min-max cut
definition. Modularity is a quality measure of graph partitioning, while normal-
ized cut is not. Note that with this definition both top-down (divide and conquer)
and bottom-up (agglomerative) algorithms can be used for graph partitioning.
Finding the best
Q
N
is NP-complete. Instead of performing an exhaustive
search, Newman used a bottom-up approach [1, 13] which begins by merging two
vertices into clusters that increase
Q
N
, then likewise merging pairs of clusters,
until all have merged to form a single cluster. At each stage the value of
Q
N
is
recorded. The partition with the highest
Q
N
is the solution of this algorithm. This
is a typical hill-climbing greedy search algorithm, which generally has high speed
but easily falls into a local optimum. Guimera and Amaral [7] use a simulated
Search WWH ::
Custom Search