Information Technology Reference
In-Depth Information
up cluster-merging phase is a greedy procedure, which may contain errors or fall into local minima. To
address this problem, we apply a top-down refinement procedure.
Top-Down Reinement Phase
The top-down refinement phase refines the greedy clustering results produced by the bottom-up cluster-
merging phase (see Figure 4 for example). The objective in this phase is to make clusters more distinct
from each other.
We first define the term sub-node: a sub-node s of a node u in a graph G i+1 is a node s in graph G i .
For instance in Figure 5, node x is a sub-node of node u . The top-down refinement procedure operates
on the following rule: If a sub-node x of a node u is moved into another node v and this movement results
in reduction of the inter-similarity between the two nodes, then the sub-node x should be moved into
the node v . The reduction of the inter-similarity between two nodes, u and v , by moving a sub-node x
from node u to node v can be expressed by a gain function which is defined as:
+
(15)
gain u v sim u v sim u x v x
( , )
=
( , )
((
),(
))
x
where u-x means the node after removing sub-node x out of u , and v+x means the node after adding
sub-node x into v . Although a sub-node is considered to be moved into any of its connected nodes, it is
moved only to its connected node that results in the greatest positive gain. To keep track of the gains, a
gain list is used and its implementation can be found in, e.g., Fiduccia and Mattheyses (1982).
Our refinement procedure refines clustering solution from the smallest graph, G t , at the top level to
the initial graph, G 0 , at the lowest level (see Figure 4). Sub-nodes are moved until no more positive gain
will is obtained. For the example shown in Figure 4, two sub-nodes are moved to different clusters.
This refinement procedure is very effective in climbing out of local minima (Hendrickson & Leland,
1993; Karypis & Kumar, 1999). It not only finds early errors produced by the greedy cluster-merging
procedure, but also can move groups of Web pages of different sizes from one cluster into another cluster
so that the inter-cluster similarity is reduced.
The nodes in graph G t at the top level in the hierarchical structure (see Figure 4) generated after the
top-down refinement procedure represent final clusters for a dataset. The resultant hierarchical structure
Figure 4. Illustration of top-down refinement procedure. (a) Shows the bottom-up clustering solution,
which is used to compare the improvement produced by the top-down refinement procedure. (b) Shows
the final clustering solution after the top-down refinement procedure. The dashed lines in (b) indicate
the error correction. The hierarchical structure in (b) can be used for users to browse Web pages.
(a)
(b)
Search WWH ::




Custom Search