Biology Reference
In-Depth Information
we propose a dimension tuning process. Compared to PROCLUS, we propose a
simplified replacing logic as well. We compared the performance of PROCLUS
and IPROCLUS on randomly generated synthetic data and real biomedical data.
The rest of this chapter is organized as follows. Related works on projected
clustering are discussed in Section 9.2. We then present our algorithm, IPRO-
CLUS (Improved PROCLUS) in Section 9.3. The experimental evaluation of our
algorithm is given in Section 9.4. Section 9.5 concludes this chapter.
9.2. Related Works
We classify projected clustering algorithms into two categories. One is density-
based algorithms [3, 13]. The other is distance-based algorithms [1, 2]. In the
following sections, we introduce two typical algorithms for each category.
9.2.1. Density-based Algorithms
Density-based algorithms define a cluster as a region that has a higher density
of data points than its surrounding regions. From the point of view of projected
clustering, we want to find dense regions only in their corresponding subspaces.
Two algorithms are discussed in this category.
CLIQUE (CLustering In QUEst) [3] is based on the following property. Dense
regions in a particular subspace still create dense regions when they are projected
onto lower dimensional subspaces. It depends on two parameters, the partition
threshold ξ and the density threshold τ . The algorithm consists of three steps:
identification of dense units, identification of clusters,and generation of minimal
descriptions. While this algorithm is insensitive to the order of records and scales
linearly with the size of inputs, like many density-based algorithms, it has expo-
nential dependence on the number of dimensions. Since there are large overlaps
among dense regions, it doesn't return disjoint clusters that are required in many
applications. In the density-units prune procedure, some potential clusters are
likely to be lost. Furthermore, the user parameters, ξ and τ , are hard to pre-select
and partitioning each dimension into ξ intervals is not flexible.
DOC (Density-based Optimal projective Clustering) [13] gives a mathematical
definition for the concept of optimal projective cluster and uses a Monte Carlo
algorithm to search for a good approximation of the optimal projective cluster.
It has one user parameter, w , which is the width of the bounding hyper-cubes.
This algorithm doesn't miss small clusters. However, it still suffers from a critical
user parameter that is the fixed global interval width w for each dimension and
Search WWH ::




Custom Search