A Novel Similarity-based Modularity Function for Graph Partitioning - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

annealing algorithm to find the optimal partition with the highest Q N .Thisisa

random search based algorithm, which usually can achieve higher quality than the

greedy search (like Guimera and Amaral claimed [7]) but has much lower speed.

According to our study [4], while Newman's modularity works well for type

I graphs (Fig. 11.1 (a)); it fails to deal well with type II graphs due to the “hub”

and “outliner” vertices. To reiterate, a “hub” vertex is a vertex that densely con-

nects two or more groups in a graph. For example, in Fig. 11.1 (b), the central

vertex 'a' connects two sub-groups. Note also that this same vertex has the largest

degree. Using Newman's modularity definition, this graph in Fig. 11.1 (b) will be

clustered into two groups,

{

a,b,c,d

}

and

{

e,f,g,h

}

. However, the more reason-

able clustering schedule could be

,where'h' is in a

cluster by itself. Newman's modularity definition will not identify outliers; rather

it will assign them membership to some larger cluster.

Generally, when there are clear sub-graph structures existed in a graph, i.e.,

the connections between sub-graphs are sparse while the connections within a

sub-graph are much dense, both min-max cut or its variations and Newman's

modularity works very well. But when the sub-graph structure becomes more

confused, i.e., the between connections become denser, the performance of these

methods deteriorates rapidly.

{

b,c,d

}

,

{

a,e,f,g

}

,and

{

h

}

11.3. A Novel Similarity-based Modularity

Based on our observation, it is not very accurate to use connections within clusters

and between clusters as the criteria of partitioning for all types of graphs, espe-

cially for graphs with complicated cluster structures. Instead, we propose a more

general concept, similarity, to measure the graph partitioning quality. A good par-

tition is one for which the similarity of vertices within a cluster is higher than the

similarity of vertices between clusters.

If two connected vertices also share a lot of neighbors, we say they are similar.

However, merely counting the number of shared neighbors doesn'ttelluswhat

proportion of the vertices' neighbors is shared, and therefore is insufficient for

distinguishing a hub from a normal vertex. To handle this problem, we adopt the

normalized similarity [11]. Let Γ i be the neighborhood of vertex i in a network,

the cosine normalization similarity is defined as:

S ( u,v )= |

Γ u ∩

Γ v |

|

.

(11.5)

Γ u |·|

Γ v |

Then we define the Similarity-based Modularity ( Q S ) function as the follow-

ing:

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home