The Semantics of Tagging - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

grows, the shape of the distribution remains the same and thus stable . Researchers

have observed, some casually and some more rigorously, that the distribution of tags

applied to particular resources in tagging systems follows a power-law distribution

where there are a relatively small number of tags that are used with great frequency

and a great number of tags that are used infrequently (Mathes 2004). If this is

the case, tag distributions may provide the stability necessary to draw out useful

information structures.

This chapter is organized as follows. In the first part, we examine how to detect

the emergence of stable 'consensus' distributions of tags assigned to individual

resources. In Sect. 5.2 we demonstrate a method for empirically examining whether

tagging distributions follow a power-law distribution. In Sect. 5.2.4 we show how

this convergence to a power-law distribution can be detected over time by using

the Kullback-Leibler divergence. We further empirically analyze the trajectory of

tagging distributions before they have stabilized, as well as the dynamics of the

long tail of tag distributions. In the second part, we examine the applications of

these stable power-law distributions. In Sect. 5.3 , we examine if this power-law is

the result of tag suggestions. In Sect. 5.4 we demonstrate how the most frequent tags

in a distribution can be used in inter-tag correlation graphs (or folksonomy graphs)

to chart their relation to one another. Section 5.5 shows how these folksonomy

graphs can be (automatically) partitioned, using community-based methods, in

order to extract shared tag vocabularies. Finally, Sect. 5.6 provides an independent

benchmark to compare our empirical results from collaborative tagging, by solving

the same problems using a completely different data set: search engine query logs.

5.1.1

Related Work

Existing research on tagging has explored a wide variety of problems, ranging

from fundamental to more practical concerns - and much of this research is not

relevant to our task at hand, such as discovering the best interfaces for presenting

tags to users (Halvey and Keane 2007) or using tags to extract data such as event

and place locations from tagged photos (Rattenbury et al. 2007). In a direction of

work that bears directly on the larger question of the semantics of collective tagging

systems, Mika (2005) addresses the problem of extracting taxonomic information

from tagging systems in the form of Semantic Web ontologies, but fails to address

the stability of collective tagging. More of interest is the study of Shen and Wu

on the structure of a tagging network for del.icio.us data which examines network

characteristics of the tagging system such as the degree distribution (the distribution

of the number of other nodes each node is connected to) and the clustering

coefficient (based on a ratio of the total number of edges in a subgraph to the number

of all possible edges) (Shen and Wu 2005). Shen and Wu do indeed find that the

a snapshot of an entire tagging network is indeed scale-free and has the features

Watts and Strogatz (1998) found to be characteristic of small world networks: small

average path length and relatively high clustering coefficient.

Search WWH ::

Custom Search

Home