The Semantics of Tagging - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

type of explanation is easily amendable to preferential attachment models, also

known as 'rich get richer' explanations, which are well-known to produce power-

law distributions. Intuitively, the earliest studies of tagging observed that users

imitate other pre-existing tags (Golder and Huberman 2006). Golder and Huberman

proposed that the simplest model that results in a “power-law” would be the classical

Polya urn model (2006). Imagine that there is an urn containing balls, each being

one of some finite number of colors. At every time-step, a ball is chosen at random.

Once a ball is chosen, it is put back in the urn along with another ball of the same

color, which formalizes the process of feedback given by tag suggestions. As put by

Golder and Huberman, “replacement of a ball with another ball of the same color

can be seen as a kind of imitation” where each color of a ball is made equal to a

natural language tag and since “the interface through which users add bookmarks

shows users the tags most commonly used by others who bookmarked that URL

already; users can easily select those tags for use in their own bookmarks, thus

imitating the choices of previous users” (2006). Yet, this model is too limited to

describe tagging, as it features only reinforcement of existing tags, not the addition

of new tags.

5.3.1.2

Imitation and the Yule-Simon Model

The first model that formalized the notion of new tags was proposed by Cattuto

(2006). In order for new tags to be added, a single parameter p must be added

to the model, which represents the probability of a new tag being added, with the

probability p

that an already-existing tag is reinforced by random uniform

choice over all already-existing tags. This results in a Yule-Simon model, a model

first employed by Yule (1925) to model biological genera and later Simon to model

the construction of a text as a stream of words (Simon 1955). This model has been

shown to be equivalent to the famous Barabasi and Albert algorithm for growing

networks (Bornholdt and Ebel 2001). Yet the standard Yule-Simon process does not

model vocabulary growth in tagging systems very well, as noticed by Cattuto as it

produces exponents “lower than the exponents we observe in actual data” (Cattuto

2006).

Cattuto hypothesize that this is because the Yule-Simon model assumes users are

choosing to reinforce ( p ) tags uniformly from a distribution of all tags that have

been used previously, so Cattuto concludes that “it seems more realistic to assume

that users tend to apply recently added tags more frequently than old ones” (Cattuto

2006). This behavior could be caused by the exposure of a user to a feedback

mechanism, such as the del.icio.us tag suggestion system. This suggestions exposes

the user only to a subset of previously existing tags, such as those most recently

added. Since the tag suggestion mechanism only encourages more recently-added

tags to be re-enforced with a higher probability, Cattuto added a memory kernel

with a power-law exponent to standard Yule-Simon model. This means that the

weight of a previously existing tag being reinforced is weighted according to a

power-law itself, so that a tag that has been applied x steps in the past is chosen

=(

1

−

p

)

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home