The Semantics of Tagging - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

usage, then the result is not very surprising. However, tags are often domain specific

terms, and thus may not actually reflect English language use. Therefore, it would

be useful to see if any latent sense could be extracted from the stabilized tag

distributions, and if those latent structures reflected the domain-specific organization

of information. We look at one of the most simple latent structures that can be

derived through collaborative tagging: inter-tag correlation graphs (or, perhaps more

simply, 'folksonomy graphs') . We discuss the methodology used for obtaining such

graphs and then illustrate our approach through an example domain study.

5.4.1

Methodology

The act of tagging resources by different users induces, at the tag level, a simple

distance measure between any pair of tags. This distance measure captures a degree

of co-occurrence which we interpret as a similarity metric, between the content

represented by the two tags. The collaborative filtering (Sarwar et al. 2001; Robu

and Poutre 2006) and natural language processing (Manning and Schutze 2002)

literature proposes several distance or similarity measures that can be employed

for such problems. The metric we found most useful for this problem is cosine

distance . Note that this should not be interpreted as a conclusion on our part that

cosine distance is always an optimal choice for this problem. This issue probably

requires further research on larger data sets.

Formally, let T i ,

T j )

respectively the number of times each of the tags was used individually to tag all

resources, and by N

T j represent two random tags. We denote by N

(

T i )

and N

(

the number of times two tags are used to tag the same

resource. Then the similarity between any pair of tags i and j is defined as:

(

T i ,

T j )

N

(

T i

,

T j

)

similarity

(

T i ,

T j )=

N

(5.9)

(

T i ) ∗

N

(

T j )

We use the shorthand: sim ij to denote similarity

. From these similarities we

can construct a tag-tag correlation graph or network, where the nodes represent the

tags themselves weighed by their absolute frequencies, while the edges are weighed

with the cosine distance measure. We build a visualization of this weighed tag-

tag correlation, by using a spring-embedder or spring relaxation type of algorithm.

An analysis of the structural properties of such tag graphs may provide important

insights into both how people tag and how structure emerges in collaborative

tagging.

(

T i ,

T j )

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home