Improved KL (K-means-Laplacian) Clustering with Sparse Graphs - Multimedia and Signal Processing

Digital Signal Processing Reference

In-Depth Information

However, there is still a drawback existing in these improved K-means algorithms

which is that they group examples just based on their attribute values. It is shown that

relation information of data sets provides the hidden structure information of data sets

and is very useful for clustering [13-14]. Hence, it is reasonable to combine the

attribute values and relation information of data sets together for handling clustering

tasks. Wang. et al recently proposed an integrated KL (K-means - Laplacian) cluster-

ing algorithm to use both of the attribute values and pairwise relations of data sets

[15]. In the applications, this approach seems an effective way to make use of mul-

tiple information sources of data sets. However, it should be pointed that a full con-

nected graph is used in KL clustering algorithm. A full connected graph may not be

appropriate to represent the similarity relation between data points, and the pairwise

similarity of distant points is not reliable [16-17]. In addition, the abundant super-

fluous edges in the graph will cause algorithms based on it less efficient. Hence,

graph sparsification is certainly required in KL clustering.

The most frequently used graph sparsification method is K-nearest neighbor.

However, as many reports claimed [18-19], KNN adjacency graph may not be a finer

approach for approximating the structure of data sets. So this paper will improve the

KL clustering algorithm with graphs constructed by other graph sparsification tech-

niques such as b-matching and sparse representation. B-matching algorithm ensures

the graph is exactly regular, namely every node has b neighbors at termination.

Graphs constructed by sparse representation ( l graph) also have many merits such as

sparsity and datum-adaptive neighborhoods. Therefore, based on the new constructed

graphs, the integrated KL clustering will get great improvements.

2

Background

2.1 Integrated KL Clustering

(, , , )

xx



x

=∈

X R

Given a data matrix

and their pairwise relationship

12

n

dn

×

WR such that i w represents the similarity between x and x . Sup-

pose the data set has C classes. The objective function of integrated KL algorithm is

[15]

∈

matrix

nn

×

J

=

min(

α

J

+

(1

−

α

)

J

)

(1)

k

s

where 0

is a tradeoff parameter, J and J are the objective functions of

K-means clustering and spectral clustering respectively. The inputs of the spectral

clustering method is the relationship matrix W . For free the tradeoff parameter, a

trace quotient formulation of the objective function has been proposed, namely

<<

1

J

=

min(

s

)

(2)

J

k

Multimedia and Signal Processing

Search WWH ::

Custom Search

Home