Database Reference
In-Depth Information
since it allows overlaps between communities and it leaves several network nodes
unassigned to communities. An additional, and perhaps even more important,
reason for not using modularity is its known limitation in correctly capturing
small-scale communities [ 32 ].
Instead of modularity, we employ another popular graph-based quality measure,
the graph conductance . Conductance is defined in relation to a subgraph S (i.e.,
a community), which implies a cut
; S
between the subgraph and the rest of the
graph. The measure is computed by means of the following equation:
ð
S
Þ
P
A ij
2S
i
2
S
;
j
S
Þ¼
ÞÞ ;
(5.7)
ð S
min
ð
A
ð
S
Þ;
A
where A ( S ) is the total number of edges that are incident with S :
X
X
A ij
A
ð
S
Þ¼
(5.8)
i2S
j2V
An advantage of conductance is that it is defined per community, enabling the
derivation of an empirical distribution of conductance values for a given commu-
nity structure. In that way, it is possible to quantify the performance of the
community detection method over the whole set of discovered communities and
thus assess the robustness of the method under test. Furthermore, conductance is
considered to capture the “gestalt” notion of communities and has been extensively
used for evaluating community quality in a wide range of online networks [ 8 ].
Last but not least, it is possible to evaluate a community detection method by
incorporating its results in some Information Retrieval (IR) task and measure the
IR performance for that task in terms of measures such as precision , recall , and
F-measure . In the case of tag communities, such a task is tag recommendation,
i.e., given some input tag(s) the system produces a set of tag suggestions to the user.
The advantage of employing such an evaluation method is that it is possible to use
the tagging history of real users as ground truth, which enables large-scale evalua-
tion studies at low cost.
In practice, one divides all available tag assignments of a Collaborative
Tagging System into two sets, one used for training and the other used for
testing. Based on the training set, one builds the corresponding tag graph and
detects the communities in it. Then, by using the tag assignments of the test set,
the evaluation aims to quantify the extent to which the community structure
found by use of the training set can help predict the tagging activities of users
on the test set. For each test resource that is tagged with L tags, K
<
L tags are
used as input
K are
predicted. In that way, both the number of correctly predicted tags and the one
of the missed tags are known, which enables quantification of the IR performance
in terms of precision and recall.
to the tag recommendation algorithm and the rest L
Search WWH ::




Custom Search