Biology Reference
In-Depth Information
|
. These constraints are not always applicable for all applications, e.g.,
in clustering problems where some communities are much larger than the others.
To amend the above issue, a normalized cut ( N cut ) was proposed [16]:
A
|≈|
B
|
cut ( A,B )
assoc ( A,V ) +
cut ( A,B )
assoc ( B,V ) ,
N cut =
(11.3)
where assoc ( A,V )=
u∈A,v∈V
w ( u,v ) is the total connection from vertices in A
to all vertices in the graph. Using the N cut , cutting out one vertex or some small
part of the graph will no longer always yield a small N cut value.
Using the M cut or N cut one can partition a graph into two sub-graphs. As the
natural consequence, to divide a graph into k sub-graphs, one has to adopt a top-
down approach, i.e., splitting the graph into two sub-graphs, then further splitting
these sub-graphs into next level sub-graphs; repeat this process until k sub-graphs
have been detected. The disadvantage [5] of this method is that, like all other
bisection algorithms, it only partitions a graph into two sub-graphs. Though one
can repeat this procedure on the sub-graphs, there is no guarantee of the optimality
of partitioning, and one has no clue on when to stop the repeated the bisection
procedure or how many sub-graphs should be produced in a graph.
To answer the question “what is the best graph partition”, other researchers
proposed global measurements, such as M. J. Newman's modularity [1, 7, 13]:
l s
L
2 ,
d s
L
NC
Q N =
(11.4)
i =1
where NC is the number of clusters, L is the number of edges in the network, l s
is the number of edges between vertices within cluster s ,and d s is the sum of the
degrees of vertices in cluster s . The optimal clustering is achieved by maximizing
the modularity Q N . Modularity is defined such that Q N is 0 at the extremes of all
vertices are clustered into a single cluster or of vertices are randomly clustered. As
this modularity definition requires no constraints, it is better than the min-max cut
definition. Modularity is a quality measure of graph partitioning, while normal-
ized cut is not. Note that with this definition both top-down (divide and conquer)
and bottom-up (agglomerative) algorithms can be used for graph partitioning.
Finding the best Q N is NP-complete. Instead of performing an exhaustive
search, Newman used a bottom-up approach [1, 13] which begins by merging two
vertices into clusters that increase Q N , then likewise merging pairs of clusters,
until all have merged to form a single cluster. At each stage the value of Q N is
recorded. The partition with the highest Q N is the solution of this algorithm. This
is a typical hill-climbing greedy search algorithm, which generally has high speed
but easily falls into a local optimum. Guimera and Amaral [7] use a simulated
Search WWH ::




Custom Search