Biology Reference
In-Depth Information
Greedy
Sampling
Iterative
Dimension
Tuning
Random
Selection
Best medoids
set and
corresponding
dimensions
Clusters and
outlier
Possible
medoids
set
Current
medoids set
data
points
Fig. 9.3.
The Flow Chart of the Experiment
9.4.1. Synthetic Data Generation
We generate one random case and two extreme cases. In each case, we consider
two datasets: unscaled datasets and scaled datasets. The unscaled datasets are
generated in exactly the same way as described in [1]. In order to generate the
scaled datasets, we assign a dimensional scale factor s j to each dimension j .The
variance of the normal distribution on dimension j is s j times the variance used
in [1].
The only difference between the random datasets and the two extreme datasets
is the way of generating dimensions. In the extreme case 1, all the clusters have
exactly the same dimensions. In our experimental study, we choose to let all
clusters have 10 exactly the same dimensions that are randomly chosen from all
the dimensions. We also call it all-same-dimensions case. In the extreme case 2,
we consider clusters with no common dimensions. The dimensions are chosen
randomly and we just make sure that there are no shared dimensions between
clusters. We choose to let each cluster have 4 dimensions. We also call it no-
common-dimensions case.
9.4.2. Results on Synthetic Datasets
We compare the performance of IPROCLUS and PROCLUS on synthetic datasets
in three aspects: the accuracy, the dependence on l , and the running time.
We first discuss the accuracy performance. In PROCLUS paper, the scale fac-
tors in the generated synthetic data are random numbers in the range [1, 2] for all
dimensions. That's not necessarily the case for real data. In order to simulate real
data, we use the dimensional scale factor introduced in the previous section. First,
we consider a scaled dataset for the random case. We conduct our experiment
when different dimensions have different dimensional scale factors. Table 9.1
shows the generated dimensions of the input clusters for the random case. Each
cluster has 7 dimensions which are generated randomly. The dimensional scale
Search WWH ::




Custom Search