Database Reference
In-Depth Information
7.2 Uses of Constraints
The typical supervised learning situations involves having a label associ-
ated with each instance. The semi-supervised learning situation is when only
a small subset of instances have labels. If the available labeled data repre-
sent all the relevant categories, then semi-supervised classification algorithms
can be readily used for data categorization. For details see the various algo-
rithms in the surveys (42; 49). However in many domains, knowledge of the
relevant categories is incomplete. Moreover, pairwise constraints are often a
more naturally available form of supervision than labels in certain cluster-
ing tasks. Moreover, in an interactive learning setting, a user who is not a
domain expert can sometimes provide feedback in the form of must-link and
cannot-link constraints (12; 14) more easily than class labels, since providing
constraints does not require the user to have significant prior knowledge about
the categories in the dataset.
Constraints have typically been used in clustering algorithms in two ways.
Constraints can be used to modify the cluster assignment stage of the cluster
algorithm, to enforce satisfaction of the constraints as much as possible. Alter-
natively, the distance function of the clustering algorithm can also be trained
either before or after the clustering actually occurs using the constraints. In
all of these cases, constraints can also be used in the initialization phase,
where the initial clusters are formed such that must-linked instances are in
the same clusters and cannot-linked instances are in different clusters. Based
on this categorization, existing methods for constrained clustering can be put
into two general approaches that we call constraint-based and distance-based
methods.
7.2.1 Constraint-Based Methods
In constraint-based approaches, the clustering algorithm itself is modified
so that the available labels or constraints are used to bias the search for an
appropriate clustering of the data. The pairwise constraints specify whether
two instances should be in the same cluster (must-link) or in different clus-
ters (cannot-link). Constraint-based clustering has been done using several
techniques:
modifying the clustering objective function so that it includes a term
for satisfying specified constraints (17)
clustering using side-information from conditional distributions in an
auxiliary space (44)
enforcing constraints to be satisfied during the cluster assignment in the
clustering process (47)
 
Search WWH ::




Custom Search