A Projected Clustering Algorithm and Its Biomedical Application - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

Greedy

Sampling

Iterative

Dimension

Tuning

Random

Selection

Best medoids

set and

corresponding

dimensions

Clusters and

outlier

Possible

medoids

set

Current

medoids set

data

points

Fig. 9.3.

The Flow Chart of the Experiment

9.4.1. Synthetic Data Generation

We generate one random case and two extreme cases. In each case, we consider

two datasets: unscaled datasets and scaled datasets. The unscaled datasets are

generated in exactly the same way as described in [1]. In order to generate the

scaled datasets, we assign a dimensional scale factor s j to each dimension j .The

variance of the normal distribution on dimension j is s j times the variance used

in [1].

The only difference between the random datasets and the two extreme datasets

is the way of generating dimensions. In the extreme case 1, all the clusters have

exactly the same dimensions. In our experimental study, we choose to let all

clusters have 10 exactly the same dimensions that are randomly chosen from all

the dimensions. We also call it all-same-dimensions case. In the extreme case 2,

we consider clusters with no common dimensions. The dimensions are chosen

randomly and we just make sure that there are no shared dimensions between

clusters. We choose to let each cluster have 4 dimensions. We also call it no-

common-dimensions case.

9.4.2. Results on Synthetic Datasets

We compare the performance of IPROCLUS and PROCLUS on synthetic datasets

in three aspects: the accuracy, the dependence on l , and the running time.

We first discuss the accuracy performance. In PROCLUS paper, the scale fac-

tors in the generated synthetic data are random numbers in the range [1, 2] for all

dimensions. That's not necessarily the case for real data. In order to simulate real

data, we use the dimensional scale factor introduced in the previous section. First,

we consider a scaled dataset for the random case. We conduct our experiment

when different dimensions have different dimensional scale factors. Table 9.1

shows the generated dimensions of the input clusters for the random case. Each

cluster has 7 dimensions which are generated randomly. The dimensional scale

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home