Database Reference
In-Depth Information
Input: A dataset X =
{
x 1 ,...,x n }
to cluster, k : the number of clusters to
find.
Output: A partition of X, Π k =
{
π 1 ,...,π k }
into k clusters that is a local
optima of the VQE (Equation 7.2).
1. Randomly generate cluster centroids μ 1 ,..., μ k .
2. loop until convergence do
(a) for i =1 to
do
(a.1) Assign x i to the nearest cluster π j , where nearness is
measured in terms of distance from x i to centroid μ j .
(b) Recalculate centroids μ 1 ,...,μ k according to Equation 7.5
|
X
|
3. Return the final partitions.
FIGURE 7.5 : Clustering using KMeans.
super-instance, whose co-ordinates are the average of the connected compo-
nent's and whose weight is equal to the number of instances within it (lines 1
and 2). Secondly, rather than performing a nearest centroid assignment (step
2a.1) in Figure 7.4 ), a nearest feasible centroid assignment is performed (lines
4a.1), where an assignment is feasible if it does not violate any cannot-link
constraints. When performing the nearest feasible centroid assignment step
the previous set partition is forgotten and the new partition built up incre-
mentally. Therefore, the first instance assigned to a cluster can never violate
any constraints, even if it is involved in many. Similarly if there is only one
constraint, c = ( x, y ), if x is assigned first then y is assigned to its closest feasi-
ble centroid and the assignment of x is not revisited. In this way, we can view
this algorithm as greedily trying to attempt constructing a feasible clustering
with no backtracking of previous instance assignments.
Natural variations of trying to satisfy all constraints are: a) attempting to
satisfy as many constraints as possible while ignoring noisy or inappropriate
constraints and b) having degrees of belief/importance associated with each
constraint. Both can be viewed as frameworks that allow trading of satisfying
the lesser important constraints.
7.4.2 Algorithms with Penalties - PKM, CVQE
The COP-KMeans algorithm can (see Section 7.4.1 ) improve the accuracy
at predicting an extrinsic label and also shape clusters into desirable forms.
However, when constraints are generated from labeled data there is the pos-
sibility of class label noise, thereby generating incorrect cannot-link or must-
Search WWH ::




Custom Search