Information Technology Reference
In-Depth Information
Further, if all the sample orders are first converted into the expected rank vectors,
E[ r 1 |
O i ] ,..., E[ r L |
, then an original k -means algorithm is applied to these vec-
tors. One might suppose that this k -means is equivalent to our k -o'means-EBC, but
this is not the case. A k -means is different from this k -o'means-EBC in terms of the
derivation of centers; In the k -means case, the mean vectors of the expected ranks are
directly used as cluster centers; in a k -o'means case, these means are sorted and con-
verted to rank values. Therefore, in the k -means case, the centers that correspond to
the same central orders are simultaneously kept during clustering. For example, two
mean rank vectors
O i ]
1 . 2 , 1 . 5 , 4 . 0
and
1 , 5 , 10
, correspond to the same central order
x 1
x 3 , but these two vectors are not differentiated. On the other hand, in
a k -o'means-EBC algorithm, they are considered as equivalent, and thus we suppose
that the k -o'means-EBC algorithm can find the cluster structure reflecting the ordinal
similarities among data.
x 2
15.4
Experiments on Artificial Data
We applied the algorithms in Section 15.3 to two types of data: artificially generated
data and real questionnaire survey data. In the the former experiment, we examined
the characteristics of each algorithm. In the latter experiment of the next section, we
analyzed a questionnaire survey data on preferences in sushi.
15.4.1
Evaluation Criteria
The evaluation criteria for partitions was as follows. The same object set was divided
into two different partitions: a true partition π and an estimated one π . To measure the
difference of π from π , we adopted the ratio of information loss (RIL) [18], which
is also called the uncertainty coefficient in numerical taxonomy literature. The RIL is
the ratio of the information that is not acquired to the total information required for
estimating a correct partition. This criterion is defined based on the contingency table
for indicator functions [11]. The indicator function I (( x a ,x b ) ) is 1 if an object pair
( x a ,x b ) are in the same cluster; otherwise, it is 0. The contingency table is a 2 × 2
matrix consisting of elements, a st , that are the number of object pairs satisfying the
condition I (( x a ,x b ) )= s and I (( x a ,x b ) , π )= t , among all the possible object pairs.
RIL is defined as
s =0 t =0
a st
a ··
a ·t
a st
log 2
RIL =
,
(15.17)
s =0
a
a ··
a ··
a
log 2
where a ·t = s a st , a = t a st ,and a ·· = s,t a st . The range of the RIL is [0 , 1];
it becomes 0 if two partitions are identical.
15.4.2
Data Generation Process
Test data were generated in the following two steps: In the first step, we generated the
K orders to be used as central orders. One permutation (we called it a pivot ) consist-
ing of all objects in X was generated. The other K
1 centers were generated by
 
Search WWH ::




Custom Search