Efficient Clustering for Orders - Mining Complex Data

Information Technology Reference

In-Depth Information

O k is

orders C k and a dissimilarity measure between orders d ( O a ,O b ), a central order

defined as the order that minimizes the sum of dissimilarities:

O k =argmi O

d ( O, O i ) .

(15.5)

O i ∈

C k

O k consists of all the objects in C k , i.e., X C k =

Note that the order

∪ O i ∈C k X ( O i ).

The dissimilarity d ( O k ,O i ) is calculated over common objects as in Section 15.3.1.

However, because X i

X ( O k ), the dissimilarity can always be calculated over X i .

Unfortunately, the optimal central order is not tractable except for a special cases. For

example, if using a Kendall distance, the derivation of central orders is NP-hard even if

all sample orders are complete [10].

Therefore, many approximation methods have been developed. However, to use as a

sub-routine in a k -o'means algorithm, the following two constraints must be satisfied.

First, the method must deal with incomplete orders that consist of objects randomly

sampled from X ∗ . In [12], they proposed a method to derive a central order of top k

lists, which are special kinds of incomplete orders. Top k list is an order that consists

of the most preferred k objects, and the objects that are not among the top k list are

implicitly ranked lower than these k objects. That is to say, the top k objects of a hidden

complete order are observed. In our case, objects are randomly sampled, and such a

restriction is not allowed. Second, the method should be executed without using iterative

optimization techniques. Since central orders are derived K times in each loop of the

k -o'means algorithm, the derivation method of central orders would seriously affect

efficiency if it adopts the iterative optimization.

To our knowledge, the method satisfying these two constraints is the following one

to derive the minimum square error solution under a generative model of Thurstone's

law of comparative judgment [13]. Because we used this method to derive central or-

ders, we call this clustering algorithm by the k -o'means-TMSE (Thurstone Minimum

Square Error) algorithm. We describe this method for deriving central orders. First, the

probability Pr[ x a

⊆

x b ] is estimated. The pair set of Pair( C k ) in Section 15.2 is gen-

erated from C k instep3of k -o'means-TMSE. Next, we calculate the probabilities for

every pair of objects in C k :

x a

x b |

+0 . 5

Pr[ x a

x b ]=

+1 ,

x a

x b |

x b

x a |

where

x b ,inthePair( C k ).These

probabilities are applied to a model of Thurstone's law of comparative judgment. This

model assumes that scores are assigned to each object x l , and an order is derived by

sorting according to these scores. Scores follow a normal distribution; i.e., N ( μ l ,σ ),

where μ l is the mean score of the object x l ,and σ is a common constant standard

deviation. Based on this model, the probability that object x a precedes the x b is

x a

x b |

is the number of the object pairs, x a

x b ]= ∞

−∞

)

φ ( t

−

μ a

φ ( u

−

μ b

Pr[ x a

) du dt

−∞

= Φ μ a −

μ b

√ 2 σ

(15.6)

Mining Complex Data

Search WWH ::

Custom Search

Home