Information Technology Reference
In-Depth Information
Figure 3. Illustration of the bottom-up cluster-merging procedure. The nodes at the same level are nodes
in a same graph. Some nodes at the lower level are merged to form a single node at the higher level.
The two nodes at the top level represent the two final clusters in this example.
v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8
initial sparse graph G 0 into a sequence of smaller graphs G 1 , G 2 , …, G t such that the number of nodes
| V 0 |>| V 1 |>| V 2 |> … >| V t |, where a stopping criteria is met at G t . The nodes in the smallest graph G t represent
the final clusters for a dataset.
We first define the most similar neighborhood of a node v , N v (ᆳ i ) , to be a set of nodes fulfilling the
following condition:
N ( ) {u | sim(v,u)
=
>
}
(13)
i
i
where sim ( v, u ) is the similarity between node v and node u (see Equation 12), and i is an adaptive
threshold (e.g., i =0.543) and is associated with graph G i . The nodes within N v ( i ) of node v in G i are
merged together to become a new node in the smaller graph G i+1 (illustrated in Figure 3). The number
of nodes and the number of edges in the smaller graph are reduced, and the number of Web pages in
a node in the smaller graph G i+1 is increased, resulting in grouping similar Web pages into nodes (or
clusters).
After new nodes in the smaller graph G i+1 are formed, the edges between nodes are built under two
conditions: (1) similarity between two nodes is greater than zero and (2) a new node is connected to at
most k most similar nodes. Furthermore, since v
whenever i
, we design
N ( ) N (
)
i
i
v
i 1
+
i+1 = i / β
(14)
where β> 1 is a decay factor (Rajaraman & Pan, 2000), which defines a weaker neighborhood for the
smaller graph G i+1 in order to continue to transfer G i+1 into another smaller graph. Therefore this is an
iterative procedure to transfer the initial graph G 0 to the sequence of smaller graph G 1 , G 2 , …, G t such
that | V 0 |>| V 1 |>| V 2 |> … >| V t |. The decay factor β controls the speed of reducing the value of threshold in
a way that 0 =1/ β , 1 = 0 / β , …, t = t-1 / β . The faster the value of is reduced, the more nodes in the
current graph G i may be grouped to be a new node in the next smaller graph G i+1 , producing less new
nodes in G i+1 . Therefore the decay factor β determines the speed of reducing the number of the sequence
of smaller graphs. A larger β will result in a fewer number of levels in the hierarchical structure.
A stopping factor is required to terminate this bottom-up cluster-merging procedure. The details for
the discovery of a stopping factor for Web page datasets are provided in the fifth section. This bottom-
Search WWH ::




Custom Search