A Projected Clustering Algorithm and Its Biomedical Application - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

we propose a dimension tuning process. Compared to PROCLUS, we propose a

simplified replacing logic as well. We compared the performance of PROCLUS

and IPROCLUS on randomly generated synthetic data and real biomedical data.

The rest of this chapter is organized as follows. Related works on projected

clustering are discussed in Section 9.2. We then present our algorithm, IPRO-

CLUS (Improved PROCLUS) in Section 9.3. The experimental evaluation of our

algorithm is given in Section 9.4. Section 9.5 concludes this chapter.

9.2. Related Works

We classify projected clustering algorithms into two categories. One is density-

based algorithms [3, 13]. The other is distance-based algorithms [1, 2]. In the

following sections, we introduce two typical algorithms for each category.

9.2.1. Density-based Algorithms

Density-based algorithms define a cluster as a region that has a higher density

of data points than its surrounding regions. From the point of view of projected

clustering, we want to find dense regions only in their corresponding subspaces.

Two algorithms are discussed in this category.

CLIQUE (CLustering In QUEst) [3] is based on the following property. Dense

regions in a particular subspace still create dense regions when they are projected

onto lower dimensional subspaces. It depends on two parameters, the partition

threshold ξ and the density threshold τ . The algorithm consists of three steps:

identification of dense units, identification of clusters,and generation of minimal

descriptions. While this algorithm is insensitive to the order of records and scales

linearly with the size of inputs, like many density-based algorithms, it has expo-

nential dependence on the number of dimensions. Since there are large overlaps

among dense regions, it doesn't return disjoint clusters that are required in many

applications. In the density-units prune procedure, some potential clusters are

likely to be lost. Furthermore, the user parameters, ξ and τ , are hard to pre-select

and partitioning each dimension into ξ intervals is not flexible.

DOC (Density-based Optimal projective Clustering) [13] gives a mathematical

definition for the concept of optimal projective cluster and uses a Monte Carlo

algorithm to search for a good approximation of the optimal projective cluster.

It has one user parameter, w , which is the width of the bounding hyper-cubes.

This algorithm doesn't miss small clusters. However, it still suffers from a critical

user parameter that is the fixed global interval width w for each dimension and

Search WWH ::

Custom Search

Home