Digital Signal Processing Reference
In-Depth Information
Improved KL (K-means-Laplacian) Clustering
with Sparse Graphs
Lai Wei 1 and Feifei Xu 2
1 Department of Computer Science, Shanghai Maritime University, Haigang Avenue 1550,
Shanghai, China
weilai@shmtu.edu.cn
2 Department of Computer Science and Technology, Shanghai University of Electric Power,
Pingliang Road 2103, Shanghai, China
xufeifei1983@hotmail.com
Abstract. KL (K-means-Laplacian) clustering combines K-means and spectral
clustering algorithm together to use both of the attribute values of data and the
pairwise relations between the data points. However, a full connected graph
used in KL clustering may not be appropriate to indicate the similarity relations
between the data points. This paper has improved the KL clustering with sparse
graphs constructed by b-matching method and sparse representation. The b-
matching graph, in which each node has strictly b neighbors, is more regular
than KNN (K nearest neighbors) graph. Graph constructed by sparse representa-
tion (l1 graph) also has many merits such as sparsity and datum-adaptive neigh-
borhoods. Hence the improved KL clustering with the constructed graphs has
more attractive properties than the classic KL clustering. The experiments on
the benchmark datasets (COIL-20 and MNIST) show the effectiveness of the
improved KL clustering with promising results.
Keywords: B-matching, Sparse representation, K-means, Spectral clustering.
1 Introduction
Clustering has been among the most active research topics in machine learning and
pattern recognition communities [1-2]. Intuitively, clustering finds a partition of the
data points into clusters such that the points within a cluster are more similar to each
other. Many clustering algorithms have been developed over the past few decades [3],
including spatial clustering [4-5], sub-space clustering [6-7], density-based clustering
[8], and so on. Among them, K-means clustering [9] is one of the most popular and
efficient clustering methods, it aims to minimize the sum of the squared distance be-
tween the data points and their corresponding cluster centers. However, researches
also indicate that there are some problems existing in the K-means algorithm [10].
Firstly, the predefined criterion is usually non-convex which causes many local op-
timal solutions; secondly, the iterative procedure for optimizing the criterion usually
makes the final solutions heavily depend on the initializations. Therefore, a kind of
effective methods using spectral relaxation technique have been proposed to alleviate
the problems [11-12].
 
Search WWH ::




Custom Search