Information Technology Reference
In-Depth Information
Figure 3. Illustration of the bottom-up cluster-merging procedure. The nodes at the same level are nodes
in a same graph. Some nodes at the lower level are merged to form a single node at the higher level.
The two nodes at the top level represent the two final clusters in this example.
v
1
v
2
v
3
v
4
v
5
v
6
v
7
v
8
initial sparse graph
G
0
into a sequence of smaller graphs
G
1
,
G
2
, …,
G
t
such that the number of nodes
|
V
0
|>|
V
1
|>|
V
2
|> … >|
V
t
|, where a stopping criteria is met at
G
t
. The nodes in the smallest graph
G
t
represent
the final clusters for a dataset.
We first define the most similar neighborhood of a node
v
,
N
v
(ᆳ
i
)
, to be a set of nodes fulfilling the
following condition:
N ( ) {u | sim(v,u)
=
>
}
(13)
i
i
where
sim
(
v, u
) is the similarity between node
v
and node
u
(see Equation 12), and
ᆳ
i
is an adaptive
threshold (e.g.,
ᆳ
i
=0.543) and is associated with graph
G
i
. The nodes within
N
v
(
ᆳ
i
) of node
v
in
G
i
are
merged together to become a new node in the smaller graph
G
i+1
(illustrated in Figure 3). The number
of nodes and the number of edges in the smaller graph are reduced, and the number of Web pages in
a node in the smaller graph
G
i+1
is increased, resulting in grouping similar Web pages into nodes (or
clusters).
After new nodes in the smaller graph
G
i+1
are formed, the edges between nodes are built under two
conditions: (1) similarity between two nodes is greater than zero and (2) a new node is connected to at
most
k
most similar nodes. Furthermore, since
v
whenever
i
≥
, we design
N ( ) N (
⊆
)
i
i
v
i 1
+
ᆳ
i+1
=
ᆳ
i
/
β
(14)
where
β>
1 is a decay factor (Rajaraman & Pan, 2000), which defines a weaker neighborhood for the
smaller graph
G
i+1
in order to continue to transfer
G
i+1
into another smaller graph. Therefore this is an
iterative procedure to transfer the initial graph
G
0
to the sequence of smaller graph
G
1
,
G
2
, …,
G
t
such
that |
V
0
|>|
V
1
|>|
V
2
|> … >|
V
t
|. The decay factor
β
controls the speed of reducing the value of threshold
ᆳ
in
a way that
ᆳ
0
=1/
β
,
ᆳ
1
=
ᆳ
0
/
β
, …,
ᆳ
t
=
ᆳ
t-1
/
β
. The faster the value of
ᆳ
is reduced, the more nodes in the
current graph
G
i
may be grouped to be a new node in the next smaller graph
G
i+1
, producing less new
nodes in
G
i+1
. Therefore the decay factor
β
determines the speed of reducing the number of the sequence
of smaller graphs. A larger
β
will result in a fewer number of levels in the hierarchical structure.
A stopping factor is required to terminate this bottom-up cluster-merging procedure. The details for
the discovery of a stopping factor for Web page datasets are provided in the fifth section. This bottom-
Search WWH ::
Custom Search