Database Reference
In-Depth Information
the case of tags referring to images, such a tag could refer to the picture quality of
the image, for example, “bright,” “black and white,” etc.). Outlier tags may corres-
pond to personal tags (i.e., tags that have some specific meaning only for the person
using them), infrequent tags, or spam/erroneous tags.
5.3.3 Evaluation of Tag Communities
Evaluating the results of community detection, which is a kind of clustering
process, constitutes a challenging task. Due to the size and uncontrolled nature of
folksonomy data, it is simply impractical to have the produced communities
subjectively evaluated by human subjects. Furthermore, since Collaborative Tag-
ging Systems are complex systems characterized by evolving and emerging seman-
tics, there is no standard ground truth and it is hard to rely on external sources of
knowledge, such as Wikipedia 7 to establish semantic relationships among the tags
of a folksonomy. Therefore, there is no commonly agreed evaluation protocol for
assessing the quality of the produced communities.
Obviously, the most direct means of evaluating the quality of a set of commu-
nities is to subject them to human judgment, i.e., to ask people to assess the
relatedness that members of the same community have to each other. By asking
multiple users to evaluate the same community, it is also possible to derive confi-
dence scores based on the inter-annotator agreement, thus removing the subjective
element of the evaluation. However, such user evaluation studies are costly and can
only be applied to limited samples of the community structures under test.
An implicit means of evaluating the quality of communities on a graph is by use
of some graph-based community structure quality measure. Such a measure, which
is commonly used in the community detection literature, is the modularity of the
community structure. Modularity quantifies the extent to which the division of a
network into communities results into more edges between nodes of the same
community than those that would result from the same division but with a random
edge distribution between the nodes. Modularity is computed by means of the (5.6):
2 m X
ij
1
k i
k j
2 m
Q
¼
A ij
c i ;
c j Þ;
(5.6)
where A denotes the adjacency matrix of the graph, c i is the community to which
node i belongs, and
( c i , c j ) is the Kronecker delta symbol.
There are two reasons why we are not going to use modularity in our evaluation
study. First, modularity can be computed only for partition-like community struc-
tures. Instead, the community structure proposed by our method is not partitional,
d
7 Wikipedia, http://wikipedia.org/
Search WWH ::




Custom Search