Information Technology Reference
In-Depth Information
(
)=
(
) / (
+ τ )
(
)
with a probability p
τ
“is a characteristic time scale over which recently added words have comparable
probabilities” (Cattuto 2006). While the parameter p controls the probability of
reinforcing an existing tag, this second parameter
x
a
t
x
,where a
t
is a normalization factor and
, controls how fast the memory
kernel decays and so over what time-scale a tag may likely count as 'new' and so be
more likely to be reinforced. As Cattuto notes, “the average user is exposed to a few
roughly equivalent top-ranked tags and this is translated mathematically into a low-
rank cutoff of the power-law” (Cattuto 2006). This model produces an “excellent
agreement” with the results of tag-correlation graphs (Cattuto 2006). It should be
clear that the original Yule-Simon model simply parametrizes the probability of
the imitation of existing tags. The modified Yule-Simon model with a power-law
memory kernel also depends on the imitation of existing tags, where the probability
of a previously-used tag is decaying according to a power-law function.
τ
5.3.1.3
Adding Parameters and Background Knowledge
Although Cattuto's model is without a doubt an elegant minimal model that captures
tag-correlation distributions well, it was not tested against tag-resource distributions
(Cattuto 2006). Furthermore, as noticed by Dellschaft and Staab, Cattuto's model
also does not explain the sub-linear tag vocabulary growth of a tagging system
(2008). Dellschaft and Staab propose an alternative model, which adds a number
of new parameters that fit the data produced by tag-growth distributions and tag-
resource distributions better than Cattuto's model (2008). The main point of interest
in their model is that instead of a new tag being chosen uniformly, the new tag is
chosen from a power-law distribution that is meant to approximate “background
knowledge.” So besides “background knowledge” ( p ), their model also features
the inverse of “background knowledge,” i.e. the “probability that a user imitates
a previous tag assignment” ( p ) (Dellschaft and Staab 2008). In essence, Dellschaft
and Staab have added (at least) two new parameters to a Yule-Simon process, and
these additional parameters allow the reinforcement of existing tags to be more
finely tuned. Instead of a single power-law memory kernel with a single parameter
,
these additional parameters allow the modeling of “an effect that is comparable to
the fat-tailed access of the Yule-Simon model with memory” while keeping tag-
growth sub-linear (Dellschaft and Staab 2008). The model proposed by Cattuto
keeps the tag-growth parameter equal to 1 and so makes tag growth linear to p
(Cattuto 2006). Yet for us, the most important advantage of Dellschaft and Staab
over Cattuto's model is that their added parameters let their model match the
previously unmatched observation by Halpin et al. of the frequency rank distribution
made in Chap. 4. The match is not as close as the match with vocabulary growth and
tag correlations, as resource-tag frequency distributions vary highly per resource,
with the exception of the drop in slope around rank 7-10.
τ
Search WWH ::




Custom Search