The Semantics of Tagging - Social Semantics: The Search for Meaning on the Web - page 140

Information Technology Reference

In-Depth Information

Algorithm 2 GreedyQ Elimination : Given a partition C 1 , ...

C n C

of graph G

=(

V

,

E

)

removes all vertexes v i ∈

V that increase Q

1. repeat

2. v i = argmax v i [ Q ( .., C k \{ v i },.. ) − Q ( .., C k ,.. )]

3. Δ Q = max v i [ Q ( .., C k \{ v i },.. ) − Q ( .., C k ,.. )]

where v i ∈ C k //C k is the partition of vertex i

4. until Δ Q ≤ 0

Modularity of the optimal partition, as general

tags are removed

Number of subsets in the optimal partition, with

general tags removed

0.55

24

22

0.5

20

0.45

18

16

0.4

14

0.35

12

100 edges

150 edges

200 edges

300 edges

400 edges

10

100 edges

150 edges

200 edges

300 edges

400 edges

0.3

8

0.25

6

0.2

4

0

1

2

3

4

5

6

7

8

9

10

0123456789 0

Number of tags removed from the graph, in

decreasing order of generality.

Number of tags removed from the graph, in

decreasing order of generality.

Fig. 5.15 Modularity (Q-factors) and number of partitions obtained after gradually eliminating

tags from the data set, such as to increase the modularity. At each step, the tag that produced the

highest increase in modularity between the initial and resulting partition was selected. In these

results, all edge weights are normalized

Regarding scalability, it is relatively straightforward to show that both

Algorithms 1 and 2 have a running time linear to the number of vertexes n ,i.e.

in this case, number of tags considered in the initial set. In the case of Algorithm 1 ,

exactly two clusters of tags are merged at each step, so one cluster increases in size

by a minimum of one, until the algorithm terminates. In case of Algorithm 2 , one

tag is eliminated per step, until termination. In practice, this scalability property

means they are easily applicable to analyze much larger folksonomy systems.

We leave some aspects open to further work. For instance, in the current

approach, similarity distances between pairs of tags are computed using all the

tagging instances in the data set. In some applications, it might be useful to first

partition the set of users that do the tagging, and then consider only the tags assigned

by a certain class of users. For example, for tags related to a given scientific field,

expert taggers may come up with a different vocabulary partition than novice users.

This may require a twofold application of this algorithm: first to partition and select

the set of users, and then the set of tags based on the most promising category

of users.

Next Page

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home