Constrained Partitional Clustering of Text Data: An Overview - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

•

initializing clusters and inferring clustering constraints based on neigh-

borhoods derived from labeled examples (5).

Constraint-based clustering techniques have been an active topic of re-

search, where recent techniques include variational techniques (28) or sam-

pling methods (36) for constrained clustering using a graphical model, and fea-

sibility studies for clustering under different types of constraints (16). There

have typically been two types of constraint-based approaches: (1) ones with

strict enforcement, which find the best feasible clustering respecting all the

given constraints (47; 15), and (2) ones with partial enforcement, which find

the best clustering while maximally respecting constraints (6; 43; 16; 28).

Figure 7.2 shows an example of a clustering which respects all the given con-

straints in Figure 7.1. Details of these algorithms are outlined in later sections.

FIGURE 7.1 : Input instances and constraints.

7.2.2 Distance-Based Methods

In distance-based approaches, an existing clustering algorithm that uses

a distance measure is employed. However, rather than use a given distance

metric, the distance measure is first trained to “satisfy” the given constraints.

In this context, satisfying the constraints means that must-linked (similar)

instances are close together and cannot-linked (different) instances are far

apart in the learned distance space. Several distance measures have been

used for distance-based constrained clustering:

•

string-edit distance trained using EM (8),

•

Jensen-Shannon divergence trained using gradient descent (12),

Search WWH ::

Custom Search

Home