The k-Dense Method to Extract Communities from Complex Networks - Mining Complex Data

Information Technology Reference

In-Depth Information

( X ( k )) for each order k for

the two methods respectively. Again we can observe that for k

Figure 14.7(d) shows the normalized entropy

E

6the k -dense

method extracts a set of communities with almost equal size, while the sizes of

communities extracted by the k -core method are unbalanced.

≥

14.4

Discussion and Related Work

The CFinder program provided by Palla et al. [10] as a part of their k -clique

method, first enumerates maximal cliques from a given network and then con-

structs the clique-clique overlap matrix by counting the number of common

nodes between each pair of cliques, and then it extracts k -clique communities

for a given k . Here we can alternatively use Carraghan and Pardalos' algorithm

[1] or Applegate and Johnson's dfmax program [4] for calculating the maximum

clique, and several methods including an algorithm proposed by Tsukiyama et

al. [16] for enumerating maximal cliques.

However, this type of problems is known to be

-hard [6]. Moreover, the

number of maximal cliques becomes extremely large for certain types of large-

scale networks. Actually in our own experiments using the Wikipedia reference

network described above, the CFinder program could not proceed its calculation

at the step of the clique-clique overlap matrix calculation. In contrast, the k -

dense method proposed in this paper can produce all of the communities of all

possible k 's in less than one minute. Here all of our experiments were done by

using a Dell PC with an Intel 3.4GHz Xeon processor with 2GB of memory.

The computational complexity of the k -dense method is closely related to that

of the clustering coe cients calculation, which is widely used to characterize

complex networks [12]. Because the k -dense condition requires that each link

must be shared by at least ( k

NP

2) different node-link triangles that are formed

by the link and its common adjacent node, the k -dense calculation as well as

the calculation of the clustering coecients, involves in repeatedly counting the

number of those triangles in a network. Although the number of triangles for

each link changes and needs to be recalculated as nodes and links that do not

satisfy the k -dense condition are eliminated, the recalculation can be done quite

eciently because only the links in the neighborhoods of the eliminated nodes

and links are affected by the elimination.

In this paper, for simplicity we focused on only undirected networks without

self-connections, i.e., we treated each link as a set of nodes. However, it is possible

to extend our framework to coping with directed networks. As one such approach,

we can follow the work by Batagelj and Zaversnik [2], in which they extend the

notion of a k -core to directed networks. As another direction, we can consider

a problem to extract bipartite cores or cliques. One pioneering work in this

direction includes Web Trawling [9].

−

14.5

Conclusion

In this paper, we presented a new concept of a subnetwork called k -dense, and

we then derived an ecient algorithm for extracting k -dense communities. In our

Mining Complex Data

Search WWH ::

Custom Search

Home