A Novel Similarity-based Modularity Function for Graph Partitioning - Clustering Challenges in Biological Network - page 232

Biology Reference

In-Depth Information

question is how to find out the conferences from a network that represents the

schedule of games played by all teams. We presume that because teams in the

same conference are more likely to play each, that the conference system can

be mapped as a structure despite the significant amount of inter-conference play.

We analyze the graph by using both GA and FHAC. The results are reported in

Table 11.4. From which, FHAC with Q N partitions the graph into 6 conferences

with 45 misclassifications [13], while FHAC with Q S partitions to 12 conferences

with only 14 errors. Again, Q S significantly outperforms Q N when combining

with FHAC.

Table 11.4.

Detecting conferences in college football teams by using Q N and Q S

Alg.

Best Q N

#ofclusters

Errors

Best Q S

#ofclusters

Errors

GA

0.601009

12

14

0.820668

12

14

FHAC

0.577284

6

45

0.820668

12

14

The second application is detecting the individuals from customer records

coming from Acxiom Corporation, where data errors blur the boundaries between

individuals with similar information (see Fig. 11.5). This example represents a

Type II graph, with hubs and outliers. In this dataset, there are 3 groups of cus-

tomers, a number of hubs (vertices 7, 10, 11, and 19) and a single outlier (vertex

21).

We test both GA and FHAC by using Q N and Q S respectively. The results are

summarized in Table 11.5. Both algorithms make 3 errors by using Q N ,theymis-

classify hub node vertex 7 and vertex 10 into wrong cluster and fail in detecting the

outlier (vertex 21). However, by using the proposed Q S , both algorithms perfectly

classify this graph, which means the Q S has better ability to deal with hub and

outlier vertices.

Fig. 11.5.

Customer record networks

Next Page

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home