The Semantics of Tagging - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

(

)=

(

) / (

+ τ )

(

)

with a probability p

τ

“is a characteristic time scale over which recently added words have comparable

probabilities” (Cattuto 2006). While the parameter p controls the probability of

reinforcing an existing tag, this second parameter

x

a

t

x

,where a

t

is a normalization factor and

, controls how fast the memory

kernel decays and so over what time-scale a tag may likely count as 'new' and so be

more likely to be reinforced. As Cattuto notes, “the average user is exposed to a few

roughly equivalent top-ranked tags and this is translated mathematically into a low-

rank cutoff of the power-law” (Cattuto 2006). This model produces an “excellent

agreement” with the results of tag-correlation graphs (Cattuto 2006). It should be

clear that the original Yule-Simon model simply parametrizes the probability of

the imitation of existing tags. The modified Yule-Simon model with a power-law

memory kernel also depends on the imitation of existing tags, where the probability

of a previously-used tag is decaying according to a power-law function.

τ

5.3.1.3

Adding Parameters and Background Knowledge

Although Cattuto's model is without a doubt an elegant minimal model that captures

tag-correlation distributions well, it was not tested against tag-resource distributions

(Cattuto 2006). Furthermore, as noticed by Dellschaft and Staab, Cattuto's model

also does not explain the sub-linear tag vocabulary growth of a tagging system

(2008). Dellschaft and Staab propose an alternative model, which adds a number

of new parameters that fit the data produced by tag-growth distributions and tag-

resource distributions better than Cattuto's model (2008). The main point of interest

in their model is that instead of a new tag being chosen uniformly, the new tag is

chosen from a power-law distribution that is meant to approximate “background

knowledge.” So besides “background knowledge” ( p ), their model also features

the inverse of “background knowledge,” i.e. the “probability that a user imitates

a previous tag assignment” ( p ) (Dellschaft and Staab 2008). In essence, Dellschaft

and Staab have added (at least) two new parameters to a Yule-Simon process, and

these additional parameters allow the reinforcement of existing tags to be more

finely tuned. Instead of a single power-law memory kernel with a single parameter

,

these additional parameters allow the modeling of “an effect that is comparable to

the fat-tailed access of the Yule-Simon model with memory” while keeping tag-

growth sub-linear (Dellschaft and Staab 2008). The model proposed by Cattuto

keeps the tag-growth parameter equal to 1 and so makes tag growth linear to p

(Cattuto 2006). Yet for us, the most important advantage of Dellschaft and Staab

over Cattuto's model is that their added parameters let their model match the

previously unmatched observation by Halpin et al. of the frequency rank distribution

made in Chap. 4. The match is not as close as the match with vocabulary growth and

tag correlations, as resource-tag frequency distributions vary highly per resource,

with the exception of the drop in slope around rank 7-10.

τ

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home