Database Reference
In-Depth Information
FIGURE 7.2 : Constraint-based clustering.
Euclidean distance modified by a shortest-path algorithm (31) and
Mahalanobis distances trained using convex optimization (4; 48)
Several clustering algorithms using trained distance measures have been
employed for constrained clustering, including single-link (8) and complete-
link (31) agglomerative clustering, EM (12; 4), and KMeans (4; 48). Recent
techniques in distance-metric learning for clustering include learning a margin-
based clustering distance measure using boosting (27), and learning a distance
metric transformation that is globally non-linear but locally linear (11). Fig-
ure 7.4 shows an example of learning a distance function from the constraints
given in Figure 7.3 and then clustering. Notice that in Figure 7.4 the input
data space has been stretched in the horizontal dimension and compressed
in the vertical dimension, to draw the must-linked instances closer and put
the cannot-linked instances farther apart. Section 7.5 outlines methods of
learning distance functions from constraints.
There have been some algorithms that try to both enforce constraints and
learn distance functions from constraints — details of these algorithms will
be presented in Section 7.6.
7.3 Text Clustering
In this section, we outline some of the specific steps of pre-processing and
distance function selection that are necessary for both unsupervised and con-
strained text clustering.
 
Search WWH ::




Custom Search