Information Technology Reference
In-Depth Information
Table 15.1. Parameters of experimental data
1) the number of sample orders: N = 1000
2) the length of the orders: L i =10
3) the total number of objects: L =10 , 100
4) the number of clusters: K = { 2 , 5 , 10 }
5) the inter-cluster isolation: { 0 . 5 , 0 . 2 , 0 . 1 , 0 . 001 }
6) the intra-cluster cohesion: { 1 . 0 , 0 . 999 , 0 . 99 , 0 . 9 }
transforming this pivot. Two adjacent objects in the pivot were randomly selected and
exchanged. This exchange was repeated at specified times. By changing the number of
exchanges, the inter-cluster isolation could be controlled.
In the second step, for each cluster, constituent orders were generated. From the
central order, L i objects were randomly selected. These objects were sorted so as to
be concordant with the central order. Again, two adjacent object pairs were randomly
exchanged. By changing the number of times that objects were exchanged, the intra-
cluster cohesion could be controlled. Note that the sizes of clusters are equal.
The parameters of the data generator are summarized in Table 15.1. The differences
between orders cannot be statistically tested if L i is too short; on the other respondents
cannot sort too many objects. Therefore, we set the order length to L i =10. Param 1-
2 are common for all the data. The total number of objects ( Param 3 )issetto10 or
100. All the sample orders are complete if L =10, and these are examined in Sec-
tion 15.4.3. We examine the incomplete case ( L = 100) in Section 15.4.4. Param 4
was the number of clusters. It is difficult to partition if this number is large, since the
sizes of the clusters then decrease. Param 5 was the inter-cluster isolation that could
be tuned by the number of times that objects are exchanged in the first step of the data
generation process. This isolation is measured by the probability that the ρ between a
pivot and another central order is smaller than that between a pivot and a random order.
The larger the isolation, the more easily clusters are separated. Param 6 was the the
intra-cluster cohesion indicating the number of times that objects are exchanged in the
second step of the data generation process. This cohesion is measured by the probability
that the ρ between the central order and a sample one is larger than that between the
central order and a random one. The larger the cohesion, the more easily a cluster could
be detected.
For each setting, we generated 100 sample sets. For each sample set, we ran the
algorithms five times using different initial partitions; then the best partition in terms of
Equation (15.8) was selected. Below, we show the means of RIL over these sets.
15.4.3
Complete Order Case
We analyzed the characteristics of the methods in Section 15.3 by applying these to
artificial data of complete orders. The two k -o'means methods were abbreviated to
TMSE and EBC , respectively. Additionally, a group average hierarchical clustering
method using dissimilarity as described in Section 15.2.1 was tested, and we denoted
 
Search WWH ::




Custom Search