Constrained Partitional Clustering of Text Data: An Overview - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

FIGURE 7.7 : DistBoost algorithm.

7.5.2 Kernel Distance Functions Using AdaBoost

Hertz et al. (27) proposed a method for distance metric learning by using

boosting in the product space of the input data space X . They posed the con-

strained metric learning problem as learning a function that took as input the

instances in the product space X

X , and output binary labels corresponding

to must-link (1) and cannot-link constraints (0). They used boosting on the

product space to learn this function, where boosting is a standard machine

learning tool that combines the strength of an ensemble of “weak” learners

(with low prediction accuracy) to create a “strong” learner (with high pre-

diction accuracy) (24). The overall flow of the DistBoost algorithm of Hertz

et al. (27) is outlined in Figure 7.7. In the first step, a constrained weighted

EM algorithm is run on the dataset and constraints, to fit a Gaussian Mixture

Model (GMM) over weighted unlabeled data and the given constraints. The

key difference of constrained EM from ordinary EM is the E-step, which sums

the assignment probabilities only over assignments that comply with the con-

straints. The output of the GMM is treated as a “weak” learner and is used

to learn a “weak” distance function, where the distance h ( x 1 ,x 2 ) between two

instances x 1 and x 2 is computed from their MAP assignment in the GMM as

follows:

×

Search WWH ::

Custom Search

Home