A Novel Similarity-based Modularity Function for Graph Partitioning - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

. These constraints are not always applicable for all applications, e.g.,

in clustering problems where some communities are much larger than the others.

To amend the above issue, a normalized cut ( N cut ) was proposed [16]:

|≈|

cut ( A,B )

assoc ( A,V ) +

cut ( A,B )

assoc ( B,V ) ,

N cut =

(11.3)

where assoc ( A,V )=

u∈A,v∈V

w ( u,v ) is the total connection from vertices in A

to all vertices in the graph. Using the N cut , cutting out one vertex or some small

part of the graph will no longer always yield a small N cut value.

Using the M cut or N cut one can partition a graph into two sub-graphs. As the

natural consequence, to divide a graph into k sub-graphs, one has to adopt a top-

down approach, i.e., splitting the graph into two sub-graphs, then further splitting

these sub-graphs into next level sub-graphs; repeat this process until k sub-graphs

have been detected. The disadvantage [5] of this method is that, like all other

bisection algorithms, it only partitions a graph into two sub-graphs. Though one

can repeat this procedure on the sub-graphs, there is no guarantee of the optimality

of partitioning, and one has no clue on when to stop the repeated the bisection

procedure or how many sub-graphs should be produced in a graph.

To answer the question “what is the best graph partition”, other researchers

proposed global measurements, such as M. J. Newman's modularity [1, 7, 13]:

l s

L −

2 ,

d s

Q N =

(11.4)

i =1

where NC is the number of clusters, L is the number of edges in the network, l s

is the number of edges between vertices within cluster s ,and d s is the sum of the

degrees of vertices in cluster s . The optimal clustering is achieved by maximizing

the modularity Q N . Modularity is defined such that Q N is 0 at the extremes of all

vertices are clustered into a single cluster or of vertices are randomly clustered. As

this modularity definition requires no constraints, it is better than the min-max cut

definition. Modularity is a quality measure of graph partitioning, while normal-

ized cut is not. Note that with this definition both top-down (divide and conquer)

and bottom-up (agglomerative) algorithms can be used for graph partitioning.

Finding the best Q N is NP-complete. Instead of performing an exhaustive

search, Newman used a bottom-up approach [1, 13] which begins by merging two

vertices into clusters that increase Q N , then likewise merging pairs of clusters,

until all have merged to form a single cluster. At each stage the value of Q N is

recorded. The partition with the highest Q N is the solution of this algorithm. This

is a typical hill-climbing greedy search algorithm, which generally has high speed

but easily falls into a local optimum. Guimera and Amaral [7] use a simulated

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home