Efficient Clustering for Orders - Mining Complex Data

Information Technology Reference

In-Depth Information

Next, we turn to the case where a set of orders, C k , consists of orders independently

generated through the above process. Each O i ∈

C k is first converted to a set of all com-

plete orders; thus, the total number of complete orders is L ∗ !

. For each complete

order, we assign weights that follow equation (15.11). By the Borda Count method, an

optimal central order for these weighted complete orders can be calculated. The mean

rank of x j (equation (15.9)) for these weighted complete orders is

C k |

Pr[ O h |

O i ] r ( O h ,x j )

E[ r j ]=

C k |

O i ∈

C k

O h ∈S ( L ∗ )

E[ r j |

O i ] ,

(15.14)

C k |

O i ∈

C k

( L ∗ ) 2

where

is a set of all complete orders. A central order is derived by sorting

objects x j

X C k in ascending order of the corresponding E[ r j ]. Since objects are

sorted according to the means of expectation of ranks, we call this method an Expected

Borda Count (EBC).

A central order derived by an EBC method is optimal if the distance d ( O i , O k ) is

measured by

∈

O i ] d S ( O h , O k ) .

Pr[ O h |

(15.15)

O h ∈S ( L ∗ )

Hence, in step 4 of Figure 15.1, not d ρ , but this equation (15.15) should be used. How-

ever, it is intractable to compute equation (15.15), because its computational complex-

ity is O ( L ∗ ( L ∗ ! /L i !)). Therefore, we adopt d ρ , and it empirically performed well, as

is shown later. Furthermore, if all sample orders are complete, d ρ is compatible with

equation (15.15). Note that we also tried

d ( O, O i )

x j ∈

( r ( O k ,x j )

O i ]) 2 ,

E[ r j |

−

X ∗

but empirically, it performed poorly.

The time complexity of a k -o'means-EBC is

O K max( N L log( L ) ,L ∗ log L ∗ ) ,

(15.16)

where L is the mean of L i over S . First, in step 3 of Figure 15.1, the K central orders

are derived. For each cluster, O (( N/K ) L ) time is required for the means of expected

ranks and O ( L ∗ log L ∗ ) time for sorting objects. Hence, the total time required for

deriving K central orders is O (max( N L, KL ∗ log L ∗ )). Second, in step 4, N orders

are classified into K clusters. Because O ( L log L ) time is required for calculating one

dissimilarity, O ( N L log( L ) K ) time is required in total. The number of iterations is

constant. Consequently, the total complexity becomes equation (15.16).

Note that the uniformity assumption of missing objects might look too strong. How-

ever, in the case of a questionnaire survey by ranking methods, the objects to be ranked

by respondents can be controlled by surveyors.

S ( L ∗ ) is equivalent to a permutation group of order L ∗ .

Mining Complex Data

Search WWH ::

Custom Search

Home