Information Technology Reference
In-Depth Information
Algorithm 2 GreedyQ Elimination : Given a partition C 1 , ...
C n C
of graph G
=(
V
,
E
)
removes all vertexes v i
V that increase Q
1. repeat
2. v i = argmax v i [ Q ( .., C k \{ v i },.. ) Q ( .., C k ,.. )]
3. Δ Q = max v i [ Q ( .., C k \{ v i },.. ) Q ( .., C k ,.. )]
where v i C k //C k is the partition of vertex i
4. until Δ Q 0
Modularity of the optimal partition, as general
tags are removed
Number of subsets in the optimal partition, with
general tags removed
0.55
24
22
0.5
20
0.45
18
16
0.4
14
0.35
12
100 edges
150 edges
200 edges
300 edges
400 edges
10
100 edges
150 edges
200 edges
300 edges
400 edges
0.3
8
0.25
6
0.2
4
0
1
2
3
4
5
6
7
8
9
10
0123456789 0
Number of tags removed from the graph, in
decreasing order of generality.
Number of tags removed from the graph, in
decreasing order of generality.
Fig. 5.15 Modularity (Q-factors) and number of partitions obtained after gradually eliminating
tags from the data set, such as to increase the modularity. At each step, the tag that produced the
highest increase in modularity between the initial and resulting partition was selected. In these
results, all edge weights are normalized
Regarding scalability, it is relatively straightforward to show that both
Algorithms 1 and 2 have a running time linear to the number of vertexes n ,i.e.
in this case, number of tags considered in the initial set. In the case of Algorithm 1 ,
exactly two clusters of tags are merged at each step, so one cluster increases in size
by a minimum of one, until the algorithm terminates. In case of Algorithm 2 , one
tag is eliminated per step, until termination. In practice, this scalability property
means they are easily applicable to analyze much larger folksonomy systems.
We leave some aspects open to further work. For instance, in the current
approach, similarity distances between pairs of tags are computed using all the
tagging instances in the data set. In some applications, it might be useful to first
partition the set of users that do the tagging, and then consider only the tags assigned
by a certain class of users. For example, for tags related to a given scientific field,
expert taggers may come up with a different vocabulary partition than novice users.
This may require a twofold application of this algorithm: first to partition and select
the set of users, and then the set of tags based on the most promising category
of users.
 
Search WWH ::




Custom Search