Information Technology Reference
In-Depth Information
of real value vectors. Hence, the central orders of two clusters happen to agree, and one
of these clusters is diminished during execution of the
k
-o'means. As the increase of
K
, clusters are merged with higher probability. For example, in the
EBC
case, when
K
=2and
K
=5, clusters are merged in 7% and 35% of the trials, respectively.
Such occurrence of merging degrades the ability of recovering clusters. As the cohe-
sion increases, the performance of
AVE
became more drastically worse than the other
two methods. Furthermore, in terms of the inter-cluster isolation, the performance of
AVE
became drastically worse as
K
increased, except for the trivial case in which the
cohesion was 1.Inthe
AVE
method, the determination to merge clusters is based on
local information, that is, a pair of clusters. Hence, the chance that orders belonging to
different clusters would happen to be merged increases when orders are broadly dis-
tributed. When comparing
EBC
and
TMSE
, these two methods are almost completely
the same.
15.4.4
Incomplete Order Case
We move to the experiments on artificial data of incomplete orders (i.e,
L
∗
= 100).
The results are shown in Figure 15.3. The meanings of the charts are the same as in
Figure 15.2.
TMSE
was slightly better than
EBC
when
K
=2and
K
=5cases; but
EBC
overcame
TMSE
when
K
=10.
AVE
was clearly the worst. We suppose that this
advantage of the
k
-o'means is due to the fact that the dissimilarities between order
pairs could not be measured precisely if the number of objects commonly included in
these two orders is few. Furthermore, the time complexity of
AVE
is
O
(
N
2
log
N
),
while the
k
-o'means algorithms are computationally more inexpensive as in Equa-
tion (15.16). When comparing
TMSE
and
EBC
,
TMSE
would be slightly better. How-
ever, in terms of time complexity,
TMSE
's
O
(
NL
∗
max(
L
∗
,K
)) is much worse than
EBC
's
O
(
K
max(
N L
log(
L
)
,L
∗
log
L
∗
) if
L
∗
is large. In addition, while the required
memory for
TMSE
is
O
(
L
∗
2
),
EBC
demands far less
O
(
KL
∗
). Therefore, it is reason-
able to conclude that
k
-o'means-EBC is an efficient and effective method for clustering
orders.
15.5
Experiments on Real Data
We applied our two
k
-o'means to questionnaire survey data, and proposed a method to
interpret the acquired clusters of orders.
15.5.1
Data Sets
Since the notion of true clusters is meaningless for real data sets, we used the
k
-o'means
as tools for exploratory analysis of a questionnaire survey of preference in sushi (a
Japanese food). This data set was collected by the procedure in our previous works
[19, 8]. In this data set,
N
= 5000,
L
i
=10,and
L
∗
= 100; in the survey, the
probability distribution of sampling objects was not uniform as in equation (15.11).
We designed it so that the more frequently supplied sushi in restaurants were more