Information Technology Reference
In-Depth Information
networks, analysis of food chains) to gene interaction networks. Newman and
Girvan (2004) provide an overview of existing applications of this theory, while
Newman (2004) presents a formal analysis of the algorithm class used.
5.5.1
Using Community Detection Algorithms to Partition
Tag Graphs
In network analysis theory, a community is defined as a subset of nodes that are
connected more strongly to each other than to the rest of the network. In this
interpretation, a community is related to clusters in the network. If the network
analyzed is a social network (i.e. vertexes represent people), then 'community' has
an intuitive interpretation. For example, in a social network where people who know
each other are connected by edges, a group of friends are likely to be identified as a
community, or people attending the same school may form a community. We should
stress, however, that the network-theoretic notion of community is much broader,
and is not exclusively applied to people. Some examples (Jin et al. 2007) include
networks of items on Ebay, physics publications on arXiv, or even food webs in
biology. We will use a community detection algorithm to identify 'vocabularies'
within a folksonomy graph, identifying 'communities' as 'vocabularies.'
5.5.1.1
Community Detection: A Formal Discussion
Let the network considered be represented by a graph G
=(
V
,
E
)
,when
|
V
| =
n
and
m . The community detection problem can be formalized as a partitioning
problem, subject to a constraint. The partitioning algorithm will result in a finite
number of explicit partitions, based on clusters in the network, that will be
considered “communities.” Each v
|
E
| =
V must be assigned to exactly one cluster
C 1 ,
j .
Generally speaking, determining the optimal partition with respect to a given
metric is intractable, as the number of possible ways to partition a graph G is very
large. Newman (2004) shows there are more than 2 n 1 ways to form a partition, thus
the problem is at least exponential in n . Furthermore, in many real life applications
(including tagging), the optimal number of disjoint clusters n C is generally not
known in advance.
In order to compare which partition is optimal, the global metric used is
modularity , henceforth denoted by Q . Intuitively, any edge that in a given partition
has both ends in the same cluster contributes to increasing modularity, while any
edge that “cuts across” clusters has a negative effect on modularity. Formally, let
e ij ,
C 2 , ...
C n C , where all clusters are disjoint, i.e.
v
V
,
v
C i ,
v
C j
i
=
i
,
j
=
1
..
n C be the fraction of all edges in the graph that connect clusters i and j
1
2
and let a i =
j e ij be the fraction of the ends of edges in the graph that fall within
cluster i (thus, we have
i a i = i , j e ij =
m ).
Search WWH ::




Custom Search