Database Reference
In-Depth Information
during clustering. The result of the CCL algorithm
is of other type (dendrogram) but besides this
very similar to the result of COP-K-means. Both
algorithms perform clustering without violating
any of the constraints. This rigid preservation of
constraints can lead to very unnatural clustering
results if the constraints do not agree well with the
data distribution. Consider for example the data
set displayed in Figure 6. The labeled objects are
visualized by larger symbols and the clustering
results are marked by different colors and are ad-
ditionally annotated. The result of COP-K-means
in Figure 6(a) only respects the constraints and
not the data distribution. The recently proposed
hierarchical density-based algorithm HISSCLU
(Böhm & Plant 2008) alleviates this problem by
proposing a different way of incorporating supervi-
sion into density-based clustering. Instead of for
formulating pair-wise must-link and cannot-link
constraints, the labeled objects are utilized as seeds
for cluster expansion. Density-based clusters are
expanded starting at each labeled object simul-
taneously. During this cluster expansion process,
labels are propagated to the un-labeled objects. In
areas of the feature space where conflicts caused
by differently labeled objects exist, a local distance
weighting is applied. Similar to MPCK-means,
HISSCLU achieves more natural results in the
case of inconsistent information provided by the
data and the supervision (for comparison Figure
6(b) displays the result of MPCK-means and
Figure 6(c) the result of HISSCLU). Experiments
demonstrate that, especially in the case of some
wrongly labeled objects (for example originating
from erroneous user ratings), the local metric
adaptation applied in HISSCLU outperforms the
global metric learning scheme of MPCK-means.
In addition, HISSCLU provides a visualization of
the hierarchical cluster structure which displays
how consistent both sources of information actu-
ally are.
In this section, our major focus is on algorithms
closely interrelating supervision and clustering. It
should be mentioned that there are several recent
approaches focusing on methods for global (Xing
et al. 2003, Bar-Hillel et al. 2003) or local (Chang
& Yeung 2004) metric learning from supervision
information which can be used prior to an arbitrary
clustering algorithm. Yip et al. (2005) propose a
semi-supervised algorithm for projected cluster-
ing which is an interesting option to cope with the
curse of dimensionality. Their algorithm SSPC
(Semi-supervised Projected Clustering) considers
not only supervision for objects in the form of
Figure 6. Results of semi-supervised clustering. The example consists of three clusters containing six
labeled objects of two different classes. (a) COP-K-means: unnatural clustering respecting the constraints;
(b) MPC-K-means: Although it contains uniformly labeled objects, the cluster on top is split into two
parts (c3 and c4) due to the limitations of K-means; (c) HISSCLU: Most natural result by density-based
clustering and local distance weighting (Figure from Böhm & Plant 2008)
Search WWH ::




Custom Search