Information Technology Reference
In-Depth Information
grows, the shape of the distribution remains the same and thus stable . Researchers
have observed, some casually and some more rigorously, that the distribution of tags
applied to particular resources in tagging systems follows a power-law distribution
where there are a relatively small number of tags that are used with great frequency
and a great number of tags that are used infrequently (Mathes 2004). If this is
the case, tag distributions may provide the stability necessary to draw out useful
information structures.
This chapter is organized as follows. In the first part, we examine how to detect
the emergence of stable 'consensus' distributions of tags assigned to individual
resources. In Sect. 5.2 we demonstrate a method for empirically examining whether
tagging distributions follow a power-law distribution. In Sect. 5.2.4 we show how
this convergence to a power-law distribution can be detected over time by using
the Kullback-Leibler divergence. We further empirically analyze the trajectory of
tagging distributions before they have stabilized, as well as the dynamics of the
long tail of tag distributions. In the second part, we examine the applications of
these stable power-law distributions. In Sect. 5.3 , we examine if this power-law is
the result of tag suggestions. In Sect. 5.4 we demonstrate how the most frequent tags
in a distribution can be used in inter-tag correlation graphs (or folksonomy graphs)
to chart their relation to one another. Section 5.5 shows how these folksonomy
graphs can be (automatically) partitioned, using community-based methods, in
order to extract shared tag vocabularies. Finally, Sect. 5.6 provides an independent
benchmark to compare our empirical results from collaborative tagging, by solving
the same problems using a completely different data set: search engine query logs.
5.1.1
Related Work
Existing research on tagging has explored a wide variety of problems, ranging
from fundamental to more practical concerns - and much of this research is not
relevant to our task at hand, such as discovering the best interfaces for presenting
tags to users (Halvey and Keane 2007) or using tags to extract data such as event
and place locations from tagged photos (Rattenbury et al. 2007). In a direction of
work that bears directly on the larger question of the semantics of collective tagging
systems, Mika (2005) addresses the problem of extracting taxonomic information
from tagging systems in the form of Semantic Web ontologies, but fails to address
the stability of collective tagging. More of interest is the study of Shen and Wu
on the structure of a tagging network for del.icio.us data which examines network
characteristics of the tagging system such as the degree distribution (the distribution
of the number of other nodes each node is connected to) and the clustering
coefficient (based on a ratio of the total number of edges in a subgraph to the number
of all possible edges) (Shen and Wu 2005). Shen and Wu do indeed find that the
a snapshot of an entire tagging network is indeed scale-free and has the features
Watts and Strogatz (1998) found to be characteristic of small world networks: small
average path length and relatively high clustering coefficient.
Search WWH ::




Custom Search