Information Technology Reference
In-Depth Information
The modularity Q of a graph
|
G
|
with respect to a partition C is defined as:
)= i ( e i , i a i )
Q
(
G
,
C
(5.10)
Informally, so Q is defined as the fraction of edges in the network that fall within a
partition, minus the expected value of the fraction of edges that would fall within the
same partition if all edges would be assigned using a uniform, random distribution.
These partitions are identified as communities by Newman and Girvan (2004). In
tagging, all of these partitions are identified as a vocabulary.
As shown in Newman (2004), if Q
0, then the chosen partition c shows the
same modularity as a random division. 8 A value of Q closer to 1 is an indicator
of stronger community structure - in real networks, however, the highest reported
value is Q
=
75. In practice, Newman (2004) found (based on a wide range of
empirical studies) that values of Q above around 0.3 indicate a strong community
structure for the given network. We will return shortly to define the algorithm by
which this optimal partition can actually be computed, but first some additional
steps are needed to link this formal definition to our tagging domain.
=
0
.
5.5.2
Edge Filtering Step
As shown in the tag graph construction step above, for our data set the initial inter-
tag graph contains 5 2 =
1225 pairwise similarities (edges), one for each potential
tag pair. So we make the choice to filter and use in further analysis only the top
m
n edges, corresponding to the strongest pairwise similarities. Here, k d is
a parameter that controls the density of the given graph (i.e. how many edges are
there to be considered vs. the number of vertexes in the graph). In practice, we take
values of k d =
=
k d
10, which for the tag graph we consider means a number of edges
from 500 down to 50.
1
...
5.5.3
Normalized vs. Non-normalized Edge Weights
The graph community identification literature generally considers graphs consisting
of discrete edges (for example, in a social network graph, people either know or
do not know each other, edges do not usually encode a “degree” of friendship).
In our graph, however, edges represent similarities between pairs of tags (c.f.
( 5.9 )) (Newman and Girvan 2004). There are two ways to specify edge weights.
The non-normalized case assigns each edge that is retained in the graph, after
8 Note that Q can also take values smaller than 0, which would indicate that the chosen partition is
worse than expected at random.
Search WWH ::




Custom Search