Information Technology Reference
In-Depth Information
where φ (
) is a normal cumulative
distribution function. Under the minimum square error criterion of this model [14], μ l ,
which is a linearly transformed image of μ l , is analytically derived as
·
) is a normal distribution density function, and Φ (
·
Φ 1 Pr[ x l
x ] ,
1
μ l =
(15.7)
|
X C k |
x∈X C k
where X C k = O i ∈C k X i .Thevalueof μ l is derived for each object in X C k . Finally,
the central order O k can be derived by sorting according to the corresponding μ l .Be-
cause the resultant partition by k -o'means-TMSE is dependent on the initial cluster, this
algorithm is run multiple times, randomly changing the initial cluster; then, the partition
minimizing the following total error is selected:
d ( O i , O k ) .
(15.8)
C k ∈π
O i ∈C k
This k -o'means-TMSE could successfully find the cluster structure in a set of incom-
plete orders due to the following reason: Because the dissimilarity in Section 15.3.1
was measured between two orders, the precision of the dissimilarities was unstable. On
the other hand, in the case of k -o'means-TMSE, central orders are calculated based on
the
is generally much larger than two, and much more information
is available; thus, the central order can be stably calculated. The dissimilarity between
the central orders and each sample order can be stably measured, too, because all of
objects in a sample order always exist in the corresponding central order and so the full
information in the sample orders can be considered.
However, the k -o'means-TMSE is not so efficient in terms of time and memory com-
plexity. Time or memory complexity in N and K is linear, and these are efficient. How-
ever, complexity in terms of L is quadratic, and further, the constant factor is rather
large due to the calculation of the inverse function of a normal distribution. Due to this
inefficiency, this algorithm cannot be used if L is large. To overcome this inefficiency,
we propose a new method in the next section.
|
C k |
orders.
|
C k |
15.3.3
k -o'means-EBC (Expected Borda Count)
To improve efficiency in computation time and memory requirement, though we used
the k -o'means framework in Figure 15.1 and the dissimilarity measure d ρ of equa-
tion (15.4) in step 4 of Figure 15.1, we employed other types of derivation procedures
for the central orders.
Below, we describe this derivation method for a central order O k of a cluster C k
in step 3 of Figure 15.1. We call this the Expected Borda Count (EBC) method, and
our new clustering method is called a k -o'means-EBC algorithm. The Borda Count
method is used to derive central orders from complete orders; we modified this so as to
make it applicable to incomplete orders. The Borda Count method [15] was originally
developed for determining the order of candidates in an election from a set of ranking
votes. A set of complete orders, C k , is given. First, for each object x j in X ,thevote
count is calculated:
 
Search WWH ::




Custom Search