Community Detection in Collaborative Tagging Systems - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

the case of tags referring to images, such a tag could refer to the picture quality of

the image, for example, “bright,” “black and white,” etc.). Outlier tags may corres-

pond to personal tags (i.e., tags that have some specific meaning only for the person

using them), infrequent tags, or spam/erroneous tags.

5.3.3 Evaluation of Tag Communities

Evaluating the results of community detection, which is a kind of clustering

process, constitutes a challenging task. Due to the size and uncontrolled nature of

folksonomy data, it is simply impractical to have the produced communities

subjectively evaluated by human subjects. Furthermore, since Collaborative Tag-

ging Systems are complex systems characterized by evolving and emerging seman-

tics, there is no standard ground truth and it is hard to rely on external sources of

knowledge, such as Wikipedia 7 to establish semantic relationships among the tags

of a folksonomy. Therefore, there is no commonly agreed evaluation protocol for

assessing the quality of the produced communities.

Obviously, the most direct means of evaluating the quality of a set of commu-

nities is to subject them to human judgment, i.e., to ask people to assess the

relatedness that members of the same community have to each other. By asking

multiple users to evaluate the same community, it is also possible to derive confi-

dence scores based on the inter-annotator agreement, thus removing the subjective

element of the evaluation. However, such user evaluation studies are costly and can

only be applied to limited samples of the community structures under test.

An implicit means of evaluating the quality of communities on a graph is by use

of some graph-based community structure quality measure. Such a measure, which

is commonly used in the community detection literature, is the modularity of the

community structure. Modularity quantifies the extent to which the division of a

network into communities results into more edges between nodes of the same

community than those that would result from the same division but with a random

edge distribution between the nodes. Modularity is computed by means of the (5.6):

2 m X

ij

1

k i

k j

2 m

Q

¼

A ij

dð

c i ;

c j Þ;

(5.6)

where A denotes the adjacency matrix of the graph, c i is the community to which

node i belongs, and

( c i , c j ) is the Kronecker delta symbol.

There are two reasons why we are not going to use modularity in our evaluation

study. First, modularity can be computed only for partition-like community struc-

tures. Instead, the community structure proposed by our method is not partitional,

d

7 Wikipedia, http://wikipedia.org/

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home