Constrained Partitional Clustering of Text Data: An Overview - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

FIGURE 7.2 : Constraint-based clustering.

•

Euclidean distance modified by a shortest-path algorithm (31) and

•

Mahalanobis distances trained using convex optimization (4; 48)

Several clustering algorithms using trained distance measures have been

employed for constrained clustering, including single-link (8) and complete-

link (31) agglomerative clustering, EM (12; 4), and KMeans (4; 48). Recent

techniques in distance-metric learning for clustering include learning a margin-

based clustering distance measure using boosting (27), and learning a distance

metric transformation that is globally non-linear but locally linear (11). Fig-

ure 7.4 shows an example of learning a distance function from the constraints

given in Figure 7.3 and then clustering. Notice that in Figure 7.4 the input

data space has been stretched in the horizontal dimension and compressed

in the vertical dimension, to draw the must-linked instances closer and put

the cannot-linked instances farther apart. Section 7.5 outlines methods of

learning distance functions from constraints.

There have been some algorithms that try to both enforce constraints and

learn distance functions from constraints — details of these algorithms will

be presented in Section 7.6.

7.3 Text Clustering

In this section, we outline some of the specific steps of pre-processing and

distance function selection that are necessary for both unsupervised and con-

strained text clustering.

Search WWH ::

Custom Search

Home