Information Technology Reference
In-Depth Information
of real value vectors. Hence, the central orders of two clusters happen to agree, and one
of these clusters is diminished during execution of the k -o'means. As the increase of
K , clusters are merged with higher probability. For example, in the EBC case, when
K =2and K =5, clusters are merged in 7% and 35% of the trials, respectively.
Such occurrence of merging degrades the ability of recovering clusters. As the cohe-
sion increases, the performance of AVE became more drastically worse than the other
two methods. Furthermore, in terms of the inter-cluster isolation, the performance of
AVE became drastically worse as K increased, except for the trivial case in which the
cohesion was 1.Inthe AVE method, the determination to merge clusters is based on
local information, that is, a pair of clusters. Hence, the chance that orders belonging to
different clusters would happen to be merged increases when orders are broadly dis-
tributed. When comparing EBC and TMSE , these two methods are almost completely
the same.
15.4.4
Incomplete Order Case
We move to the experiments on artificial data of incomplete orders (i.e, L = 100).
The results are shown in Figure 15.3. The meanings of the charts are the same as in
Figure 15.2.
TMSE was slightly better than EBC when K =2and K =5cases; but EBC
overcame TMSE when K =10. AVE was clearly the worst. We suppose that this
advantage of the k -o'means is due to the fact that the dissimilarities between order
pairs could not be measured precisely if the number of objects commonly included in
these two orders is few. Furthermore, the time complexity of AVE is O ( N 2 log N ),
while the k -o'means algorithms are computationally more inexpensive as in Equa-
tion (15.16). When comparing TMSE and EBC , TMSE would be slightly better. How-
ever, in terms of time complexity, TMSE 's O ( NL max( L ,K )) is much worse than
EBC 's O ( K max( N L log( L ) ,L log L ) if L is large. In addition, while the required
memory for TMSE is O ( L 2 ), EBC demands far less O ( KL ). Therefore, it is reason-
able to conclude that k -o'means-EBC is an efficient and effective method for clustering
orders.
15.5
Experiments on Real Data
We applied our two k -o'means to questionnaire survey data, and proposed a method to
interpret the acquired clusters of orders.
15.5.1
Data Sets
Since the notion of true clusters is meaningless for real data sets, we used the k -o'means
as tools for exploratory analysis of a questionnaire survey of preference in sushi (a
Japanese food). This data set was collected by the procedure in our previous works
[19, 8]. In this data set, N = 5000, L i =10,and L = 100; in the survey, the
probability distribution of sampling objects was not uniform as in equation (15.11).
We designed it so that the more frequently supplied sushi in restaurants were more
 
Search WWH ::




Custom Search