Improved KL (K-means-Laplacian) Clustering with Sparse Graphs - Multimedia and Signal Processing

Digital Signal Processing Reference

In-Depth Information

Improved KL (K-means-Laplacian) Clustering

with Sparse Graphs

Lai Wei 1 and Feifei Xu 2

1 Department of Computer Science, Shanghai Maritime University, Haigang Avenue 1550,

Shanghai, China

weilai@shmtu.edu.cn

2 Department of Computer Science and Technology, Shanghai University of Electric Power,

Pingliang Road 2103, Shanghai, China

xufeifei1983@hotmail.com

Abstract. KL (K-means-Laplacian) clustering combines K-means and spectral

clustering algorithm together to use both of the attribute values of data and the

pairwise relations between the data points. However, a full connected graph

used in KL clustering may not be appropriate to indicate the similarity relations

between the data points. This paper has improved the KL clustering with sparse

graphs constructed by b-matching method and sparse representation. The b-

matching graph, in which each node has strictly b neighbors, is more regular

than KNN (K nearest neighbors) graph. Graph constructed by sparse representa-

tion (l1 graph) also has many merits such as sparsity and datum-adaptive neigh-

borhoods. Hence the improved KL clustering with the constructed graphs has

more attractive properties than the classic KL clustering. The experiments on

the benchmark datasets (COIL-20 and MNIST) show the effectiveness of the

improved KL clustering with promising results.

Keywords: B-matching, Sparse representation, K-means, Spectral clustering.

1 Introduction

Clustering has been among the most active research topics in machine learning and

pattern recognition communities [1-2]. Intuitively, clustering finds a partition of the

data points into clusters such that the points within a cluster are more similar to each

other. Many clustering algorithms have been developed over the past few decades [3],

including spatial clustering [4-5], sub-space clustering [6-7], density-based clustering

[8], and so on. Among them, K-means clustering [9] is one of the most popular and

efficient clustering methods, it aims to minimize the sum of the squared distance be-

tween the data points and their corresponding cluster centers. However, researches

also indicate that there are some problems existing in the K-means algorithm [10].

Firstly, the predefined criterion is usually non-convex which causes many local op-

timal solutions; secondly, the iterative procedure for optimizing the criterion usually

makes the final solutions heavily depend on the initializations. Therefore, a kind of

effective methods using spectral relaxation technique have been proposed to alleviate

the problems [11-12].

Search WWH ::

Custom Search

Home