Community Detection in Collaborative Tagging Systems - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

since it allows overlaps between communities and it leaves several network nodes

unassigned to communities. An additional, and perhaps even more important,

reason for not using modularity is its known limitation in correctly capturing

small-scale communities [ 32 ].

Instead of modularity, we employ another popular graph-based quality measure,

the graph conductance . Conductance is defined in relation to a subgraph S (i.e.,

a community), which implies a cut

; S

between the subgraph and the rest of the

graph. The measure is computed by means of the following equation:

ð

S

Þ

P

A ij

2S

i

2

S

;

j

fð

S

Þ¼

ÞÞ ;

(5.7)

ð S

min

ð

A

ð

S

Þ;

A

where A ( S ) is the total number of edges that are incident with S :

X

A ij

A

ð

S

Þ¼

(5.8)

i2S

j2V

An advantage of conductance is that it is defined per community, enabling the

derivation of an empirical distribution of conductance values for a given commu-

nity structure. In that way, it is possible to quantify the performance of the

community detection method over the whole set of discovered communities and

thus assess the robustness of the method under test. Furthermore, conductance is

considered to capture the “gestalt” notion of communities and has been extensively

used for evaluating community quality in a wide range of online networks [ 8 ].

Last but not least, it is possible to evaluate a community detection method by

incorporating its results in some Information Retrieval (IR) task and measure the

IR performance for that task in terms of measures such as precision , recall , and

F-measure . In the case of tag communities, such a task is tag recommendation,

i.e., given some input tag(s) the system produces a set of tag suggestions to the user.

The advantage of employing such an evaluation method is that it is possible to use

the tagging history of real users as ground truth, which enables large-scale evalua-

tion studies at low cost.

In practice, one divides all available tag assignments of a Collaborative

Tagging System into two sets, one used for training and the other used for

testing. Based on the training set, one builds the corresponding tag graph and

detects the communities in it. Then, by using the tag assignments of the test set,

the evaluation aims to quantify the extent to which the community structure

found by use of the training set can help predict the tagging activities of users

on the test set. For each test resource that is tagged with L tags, K

<

L tags are

used as input

K are

predicted. In that way, both the number of correctly predicted tags and the one

of the missed tags are known, which enables quantification of the IR performance

in terms of precision and recall.

to the tag recommendation algorithm and the rest L

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home