Database Reference
In-Depth Information
The basic idea in KL [3.16] to dealing with graph partitioning is to con-
struct an initial partition of the vertices either randomly or according to some
problem-specific strategy. Then the algorithm sweeps through the vertices,
deciding whether the size of the cut would increase or decrease if we moved
this vertex x to another partition. The decision to move x can be made in
time proportional to its degree by simply counting whether more of x 's neigh-
bors are on the same partition as x . Of course, the desirable side for x will
change if many of its neighbors switch, so multiple passes are likely to be
needed before the process converges to a local optimum.
In recursive bisection, a k-way split is obtained by recursively partitioning
the graph into two subgraphs. Spectral bisection [3.17], [3.18] uses the eigen-
vector associated with the second smallest eigenvalue of the graph's Laplacian
(Fiedler vector) [3.19] for splitting.
Metis by Karypis et al. [3.11] handles multiconstraint multiobjective
graph partitioning in three phases: (i) coarsening, (ii) initial partitioning, and
(iii) refining. First a sequence of successively smaller and therefore coarser
graphs is constructed through heavy-edge matching. Second, the initial parti-
tioning is constructed using one out of four heuristic algorithms (three based
on graph growing and one based on spectral bisection). In the third phase the
coarsened partitioned graph undergoes boundary Kernighan-Lin refinement.
In this last phase vertices are only swapped among neighboring partitions
(boundaries). This ensures that neighboring clusters are more related than
nonneighboring clusters. This ordering property is beneficial for visualization,
as explained in Section 3.6.1. In contrast, because recursive bisection com-
putes a bisection of a subgraph at a time, its view is limited. Thus, it cannot
fully optimize the partition ordering and global constraints. This renders it
less effective for our purposes. Also, we found the multilevel partitioning to
deliver the best partitioning and to be the fastest and most scalable of the
three choices we investigated. Hence, Metis is used as the graph partitioner
in Opossum.
3.3.3 Determining the Number of Clusters
Finding the “right” number of clusters k for a data set is a di cult and often
ill-posed problem, because even for the same data set, there can be several
answers depending on the scale or granularity one is interested in. In market
basket clustering, the number of clusters is often prespecified due to business
constraints (such as marketing personalization, human understandability) in
the range from 5 to 20. Alternatively, heuristics for finding the right k can
be used [3.2].
Search WWH ::




Custom Search