Information Technology Reference
In-Depth Information
Algorithm 1
GretedyQ Determination
:Givenagraph
G
=(
V
,
E
)
,|
V
|
=
n
,|
E
|
=
m
returns partition
<
C
1
, ...
C
n
C
>
C
i
=
{
v
i
}
∀
i
=
,
n
1.
,
1
2.
n
C
=
n
3.
∀
i
,
j
,
e
ij
initialized as in (
5.11
)
4.
repeat
5.
<
C
i
,
C
j
>
=
argmax
c
i
,
c
j
(
e
ij
+
e
ji
−
2
a
i
a
j
)
6.
Δ
Q
=
max
c
i
,
c
j
(
e
ij
+
e
ji
−
2
a
i
a
j
)
C
i
C
j
,
C
j
=
7.
C
i
=
0
//merge C
i
and C
j
8.
n
C
=
n
C
−
1
9.
until
Δ
Q
≤
0
10.
maxQ
=
Q
(
C
1
,..
C
n
C
)
filtering, a weight of 1. Edges filtered out are implicitly assigned a weight of zero.
The normalized case assigns each edge a weight proportional to the similarity
between the tags corresponding to the ends. Formally, using the notations from (
5.9
)
to (
5.10
) from above, we initialize the values
e
ij
as:
m
e
ij
=
ij
sim
ij
sim
ij
(5.11)
∑
m
∑
ij
sim
ij
Where
is simply a normalization factor, which assures that
∑
ij
ei
ij
=
m
.
5.5.4
The Graph Partitioning Algorithm
Since we have established our framework, we can now formally define the graph
partitioning algorithm. As already shown, the number of possible partitions for this
problem is at least 2
n
−
1
(e.g. for our 50 tag setting 2
50
10
15
). Therefore, to explore
all these partitions exhaustively would be clearly unfeasible. The algorithm we use
to determine the optimal partition (Algorithm
1
) is based on Newman (2004), and it
falls into the category of 'greedy' clustering heuristics.
Informally described, the algorithm runs as follows. Initially, each of the vertexes
(in our case, the tags) are assigned to their own individual cluster. Then, at each
iteration of the algorithm, two clusters are selected which, if merged, lead to the
highest increase in the modularity
Q
of the partition. As can be seen from lines 5-6
of Algorithm
1
, because exactly two clusters are merged at each step, it is easy to
compute this increase in
Q
as:
>
(the
value of
e
ij
being symmetric). The algorithm stops when no further increase in
Q
is
possible by further merging.
Note that it is possible to specify another stopping criteria in Algorithm
1
, line
9, e.g. it is possible to ask the algorithm to return a minimum number of clusters
(subsets), by letting the algorithm run until
n
C
reaches this minimum value.
Δ
Q
=(
e
ij
+
e
ji
−
2
a
i
a
j
)
or
Δ
Q
=
2
∗
(
e
ij
−
a
i
a
j
)
Search WWH ::
Custom Search