Efficient Clustering for Orders - Mining Complex Data

Information Technology Reference

In-Depth Information

of real value vectors. Hence, the central orders of two clusters happen to agree, and one

of these clusters is diminished during execution of the k -o'means. As the increase of

K , clusters are merged with higher probability. For example, in the EBC case, when

K =2and K =5, clusters are merged in 7% and 35% of the trials, respectively.

Such occurrence of merging degrades the ability of recovering clusters. As the cohe-

sion increases, the performance of AVE became more drastically worse than the other

two methods. Furthermore, in terms of the inter-cluster isolation, the performance of

AVE became drastically worse as K increased, except for the trivial case in which the

cohesion was 1.Inthe AVE method, the determination to merge clusters is based on

local information, that is, a pair of clusters. Hence, the chance that orders belonging to

different clusters would happen to be merged increases when orders are broadly dis-

tributed. When comparing EBC and TMSE , these two methods are almost completely

the same.

15.4.4

Incomplete Order Case

We move to the experiments on artificial data of incomplete orders (i.e, L ∗ = 100).

The results are shown in Figure 15.3. The meanings of the charts are the same as in

Figure 15.2.

TMSE was slightly better than EBC when K =2and K =5cases; but EBC

overcame TMSE when K =10. AVE was clearly the worst. We suppose that this

advantage of the k -o'means is due to the fact that the dissimilarities between order

pairs could not be measured precisely if the number of objects commonly included in

these two orders is few. Furthermore, the time complexity of AVE is O ( N 2 log N ),

while the k -o'means algorithms are computationally more inexpensive as in Equa-

tion (15.16). When comparing TMSE and EBC , TMSE would be slightly better. How-

ever, in terms of time complexity, TMSE 's O ( NL ∗ max( L ∗ ,K )) is much worse than

EBC 's O ( K max( N L log( L ) ,L ∗ log L ∗ ) if L ∗ is large. In addition, while the required

memory for TMSE is O ( L ∗ 2 ), EBC demands far less O ( KL ∗ ). Therefore, it is reason-

able to conclude that k -o'means-EBC is an efficient and effective method for clustering

orders.

15.5

Experiments on Real Data

We applied our two k -o'means to questionnaire survey data, and proposed a method to

interpret the acquired clusters of orders.

15.5.1

Data Sets

Since the notion of true clusters is meaningless for real data sets, we used the k -o'means

as tools for exploratory analysis of a questionnaire survey of preference in sushi (a

Japanese food). This data set was collected by the procedure in our previous works

[19, 8]. In this data set, N = 5000, L i =10,and L ∗ = 100; in the survey, the

probability distribution of sampling objects was not uniform as in equation (15.11).

We designed it so that the more frequently supplied sushi in restaurants were more

Search WWH ::

Custom Search

Home