Information Technology Reference
In-Depth Information
where
φ
(
) is a normal cumulative
distribution function. Under the minimum square error criterion of this model [14],
μ
l
,
which is a linearly transformed image of
μ
l
, is analytically derived as
·
) is a normal distribution density function, and
Φ
(
·
Φ
−
1
Pr[
x
l
x
]
,
1
μ
l
=
(15.7)
|
X
C
k
|
x∈X
C
k
where
X
C
k
=
O
i
∈C
k
X
i
.Thevalueof
μ
l
is derived for each object in
X
C
k
. Finally,
the central order
O
k
can be derived by sorting according to the corresponding
μ
l
.Be-
cause the resultant partition by
k
-o'means-TMSE is dependent on the initial cluster, this
algorithm is run multiple times, randomly changing the initial cluster; then, the partition
minimizing the following total error is selected:
d
(
O
i
, O
k
)
.
(15.8)
C
k
∈π
O
i
∈C
k
This
k
-o'means-TMSE could successfully find the cluster structure in a set of incom-
plete orders due to the following reason: Because the dissimilarity in Section 15.3.1
was measured between two orders, the precision of the dissimilarities was unstable. On
the other hand, in the case of
k
-o'means-TMSE, central orders are calculated based on
the
is generally much larger than two, and much more information
is available; thus, the central order can be stably calculated. The dissimilarity between
the central orders and each sample order can be stably measured, too, because all of
objects in a sample order always exist in the corresponding central order and so the full
information in the sample orders can be considered.
However, the
k
-o'means-TMSE is not so efficient in terms of time and memory com-
plexity. Time or memory complexity in
N
and
K
is linear, and these are efficient. How-
ever, complexity in terms of
L
∗
is quadratic, and further, the constant factor is rather
large due to the calculation of the inverse function of a normal distribution. Due to this
inefficiency, this algorithm cannot be used if
L
∗
is large. To overcome this inefficiency,
we propose a new method in the next section.
|
C
k
|
orders.
|
C
k
|
15.3.3
k
-o'means-EBC (Expected Borda Count)
To improve efficiency in computation time and memory requirement, though we used
the
k
-o'means framework in Figure 15.1 and the dissimilarity measure
d
ρ
of equa-
tion (15.4) in step 4 of Figure 15.1, we employed other types of derivation procedures
for the central orders.
Below, we describe this derivation method for a central order
O
k
of a cluster
C
k
in step 3 of Figure 15.1. We call this the
Expected Borda Count
(EBC) method, and
our new clustering method is called a
k
-o'means-EBC
algorithm. The Borda Count
method is used to derive central orders from complete orders; we modified this so as to
make it applicable to incomplete orders. The Borda Count method [15] was originally
developed for determining the order of candidates in an election from a set of ranking
votes. A set of complete orders,
C
k
, is given. First, for each object
x
j
in
X
∗
,thevote
count is calculated: