A Novel Similarity-based Modularity Function for Graph Partitioning - Clustering Challenges in Biological Network - page 233

Biology Reference

In-Depth Information

Table 11.5.

Performance comparison on Acxiom dataset using Q N and Q S respectively

Alg.

Expected clusters

Expected Q N

Best Q N

clusters

errors

GA+ Q N

4

0.378893

0.399623

3

3

FHAC+ Q N

4

0.378893

0.37669

3

3

Alg.

Expected clusters

Expected Q S

Best Q S

clusters

errors

GA+ Q S

4

0.474738

0.474738

4

0

FHAC+ Q S

4

0.474738

0.474738

4

0

Table 11.6.

FHAC running time for very large network

Vertices

Edges

FHAC+ Q N

FHAC+ Q S

52909

245300

591 Sec.

658 Sec.

The final experiment was on a very large dataset - the DBLP authors' col-

laboration networks of 2005 [10], which consist of 52,909 vertices (authors) and

245,300 edges (co-author relationship). The purpose of this experiment is testing

the speed of FHAC on a very large network. The running time on an Intel PC

with P3.2G CPU and 2GB memory are reported in Table 11.6. One can see that

optimizing Q S it runs marginally slower than optimizing Q N , which means Q S

cooperates with the FHAC algorithm very well.

11.7. Conclusion

In this paper, we propose a novel similarity-based modularity ( Q S ) to measure the

quality of a graph partitioning. Through theoretical analysis and extensive exper-

iments, we can conclude that the propose measure is significantly more accurate

and robust than Newman's connection-based modularity with respect to the result

of clustering. Furthermore, it has a better ability to deal with hubs and outlin-

ers. The proposed similarity-based modularity in combination with the Genetic

clustering algorithm (GA) and the greedy search algorithm FHAC yields an im-

proved accuracy for even dense, confused graphs. The FHAC often converges

to the global optimal for real applications using the proposed modularity. How-

ever, in some very tough cases, such as in very confused synthetic graphs, FHAC

significantly lags the global optimal obtained by GA. This suggests us to further

study more powerful fast clustering algorithm in the future to exert the potential

of the proposed modularity definition.

References

[1]

A. Clauset, M. Newman, and C. Moore. Finding community structure in very large

networks , Physical Review E , 70:066111, 2004.

[2]

C. R. Dias and L. S. Ochi. Efficient evolutionary algorithms for the clustering prob-

lem in directed graphs. The Congress on Evolutionary Computation, CEC '03 , 2003.

Next Page

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home