Information Technology Reference
In-Depth Information
usage, then the result is not very surprising. However, tags are often domain specific
terms, and thus may not actually reflect English language use. Therefore, it would
be useful to see if any latent sense could be extracted from the stabilized tag
distributions, and if those latent structures reflected the domain-specific organization
of information. We look at one of the most simple latent structures that can be
derived through collaborative tagging: inter-tag correlation graphs (or, perhaps more
simply, 'folksonomy graphs') . We discuss the methodology used for obtaining such
graphs and then illustrate our approach through an example domain study.
5.4.1
Methodology
The act of tagging resources by different users induces, at the tag level, a simple
distance measure between any pair of tags. This distance measure captures a degree
of co-occurrence which we interpret as a similarity metric, between the content
represented by the two tags. The collaborative filtering (Sarwar et al. 2001; Robu
and Poutre 2006) and natural language processing (Manning and Schutze 2002)
literature proposes several distance or similarity measures that can be employed
for such problems. The metric we found most useful for this problem is cosine
distance . Note that this should not be interpreted as a conclusion on our part that
cosine distance is always an optimal choice for this problem. This issue probably
requires further research on larger data sets.
Formally, let T i ,
T j )
respectively the number of times each of the tags was used individually to tag all
resources, and by N
T j represent two random tags. We denote by N
(
T i )
and N
(
the number of times two tags are used to tag the same
resource. Then the similarity between any pair of tags i and j is defined as:
(
T i ,
T j )
N
(
T i
,
T j
)
similarity
(
T i ,
T j )=
N
(5.9)
(
T i )
N
(
T j )
We use the shorthand: sim ij to denote similarity
. From these similarities we
can construct a tag-tag correlation graph or network, where the nodes represent the
tags themselves weighed by their absolute frequencies, while the edges are weighed
with the cosine distance measure. We build a visualization of this weighed tag-
tag correlation, by using a spring-embedder or spring relaxation type of algorithm.
An analysis of the structural properties of such tag graphs may provide important
insights into both how people tag and how structure emerges in collaborative
tagging.
(
T i ,
T j )
 
Search WWH ::




Custom Search