Information Technology Reference
In-Depth Information
There is a clear effect in the dynamics of the above distributions. 6 At the
beginning of the process when the distributions contain only a few tags, there is
a high degree of randomness, indicated by early data points. However, in most
cases this converges relatively quickly to a very small value, and then in the final
ten steps, to a Kullback-Leibler distance which is graphically indistinguishable
from zero (with only a few outliers). If the Kullback-Leibler divergence between
two consecutive time points (in Fig. 5.4 a) or between each step and the final
one (Fig. 5.4 b) becomes zero or close to zero, it indicates that the shape of the
distribution has stopped changing. The results here suggest that the power law
may form relatively early on in the process for most sites and persist throughout.
Even if the number of tags added by the users increases many-fold, the new
tags reinforce the already-formed power law. Interestingly, there is a substantial
amount of variation in the initial values of the Kullback-Leibler distance prior to the
convergence. Future work might explore the factors underlying this variation and
whether it is a function of the content of the sites or of the mechanism behind the
tagging of the site. Additionally, convergence to zero occurs at approximately the
same time period (often within a few months) for these sites.
The results of the Kullback-Leibler analysis provide a powerful tool for analyzing
the dynamics of tagging distributions. This very well might be the result of the scale-
free property of tagging networks, so that once the tagging of users has reached a
certain threshold, regardless of how many tags are added, the distribution remains
stable (Shen and Wu 2005). This method can be immensely useful in analyzing real-
world tagging systems where the stability of the categorization scheme produced by
the tagging needs to be confirmed.
5.2.4.4
Examining the Dynamics of the Entire Tag Distribution
In the previous sections, we focused on the distributions of the tags in the top 25
positions. However, heavily tagged or popular resources, such as those considered
in our analysis, can be tagged several tens of thousands of times each, producing
hundreds or even thousands of distinct tags. It is true that many of these distinct
tags are simply personal bookmarks which have no meaning for the other users in
the system. However, it is still crucial to understand their dynamics and the role
they play in tagging, especially with respect to the top of the tag distribution. Some
sources (e.g. Anderson 2006), have argued that the dynamics of long tails are a
fundamental feature of Internet-scale systems. Here we were particularly interested
in two questions. First, how does the number of times a site is tagged (including the
long tail) evolve in time? Second, how does the relative importance of the head (top
25 tags) to the long tail change as tags are added to a resource?
6 Note that in Fig. 5.4 , the first two time points were omitted because their distribution involved few
tags and were thus very highly random.
Search WWH ::




Custom Search