Database Reference
In-Depth Information
class labels but also supervision for dimensions.
This type of supervision is modeled by specifying
dimensions which are relevant for certain classes.
The algorithmic paradigm of SSPC is similar to the
algorithms PROCLUS and ORCLUS with some
modifications. Most importantly, labeled objects
and labeled dimensions are used for initialization
and the parameter l specifying the average cluster
dimensionality is replaced by a more intuitive
parameter specifying the maximum variance for
relevant dimensions.
cluster density which is expressed by the sum of
weights of the edges within the clusters. Several
further objective functions for balanced graph
partitioning have been proposed, for example
minMaxCut (Ding et al. 2001) which sums up
the weights within each cluster separately and
thus strikes to obtain individual clusters of high
object density. Introducing balancing constraints
however makes the graph partitioning problem
NP-hard (Wagner & Wagner 1993).
Spectral clustering proposes algorithms to solve
relaxed forms of the balanced graph partitioning
problem. Most algorithms for spectral clustering
such as (Shi & Malik 2000, Ng et al. 2002) fol-
low a similar paradigm. After creating a weighted
similarity graph from data, the Laplacian of this
graph is constructed. The similarity graph is rep-
resented by a symmetric adjacency matrix A. The
unnormalized Laplacian of a graph is obtained by
subtracting the adjacency matrix from the degree
matrix, i.e. L = D-A. The spectrum of the Lapla-
cian obtained by Eigenvalue decomposition has
interesting properties for clustering, for example
the number of constant Eigenvectors coincides with
the number of connected components of a graph.
Usually, we have a fully connected graph in clus-
tering. In this case, the clusters can be detected by
mapping the data objects to the space spanned by
the first K Eigenvectors and performing standard
K-means. It can be proven that this procedure yields
an approximation of balanced graph partitioning.
The algorithms differ in the ways if and how
the Laplacian is normalized. Thereby, different
objective functions can be optimized. Recall that
the objectives of clustering are two-fold: First,
the objects in different clusters should be as dis-
similar as possible and secondly the objects within
one cluster should be as similar as possible. The
simplest case of no normalization of the Lapla-
cian addresses only the first goal. Performing
K-means in the Eigenvector space approximates
the rCut objective function which only considers
balance in the number of objects which is achieved
by K-means. To explicitly require within cluster
SPectrAl cluSterIng
Spectral clustering considers the clustering prob-
lem from the perspective of graph theory. The data
is provided by a similarity matrix from which a
weighted graph is constructed. For many types of
data the graph representation is very natural, for
example for social networks where the nodes are
different people and the edges represent friendship.
The clustering problem is to find a partitioning of
the graph such that the edges between different
clusters have a very low weight. Different objec-
tive functions for a good partitioning have been
proposed. If for example a partitioning into two
clusters is desired, the Minimum Cut objective
function (minCut) just removes the edge having
the lowest weight. The minCut problem can eas-
ily be solved (Stoer & Wagner 1997) but leads
to an undesired partitioning in many cases for
example by simply removing a single outlying
vertex from the graph. One option to circumvent
this problem is to request that the clusters should
be reasonably large. This is implemented in the
objective function Ratio Cut (rCut) first introduced
by (Hagen & Kahng 1992) which considers the
ratio between the weight of the cut edges and the
size of the resulting clusters. As an alternative,
the objective function Normalized Cut (nCut)
(Shi & Malik 2000) considers the connectivity
between clusters, expressed by the weight of the
cut edges as in rCut, but in relation to the within
Search WWH ::




Custom Search