Efficient Clustering for Orders - Mining Complex Data

Information Technology Reference

In-Depth Information

15

Efficient Clustering for Orders

Toshihiro Kamishima and Shotaro Akaho

National Institute of Advanced Industrial Science and Technology (AIST),

AIST Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki, 305-8568 Japan

mail@kamishima.net

http://www.kamishima.net/

s.akaho@aist.go.jp

Abstract. Lists of ordered objects are widely used as representational forms. Such ordered ob-

jects include Web search results or best-seller lists. Clustering is a useful data analysis technique

for grouping mutually similar objects. To cluster orders, hierarchical clustering methods have

been used together with dissimilarities defined between pairs of orders. However, hierarchical

clustering methods cannot be applied to large-scale data due to their computational cost in terms

of the number of orders. To avoid this problem, we developed an k -o'means algorithm. This al-

gorithm successfully extracted grouping structures in orders, and was computationally efficient

with respect to the number of orders. However, it was not efficient in cases where there are too

many possible objects yet. We therefore propose a new method ( k -o'means-EBC), grounded on

a theory of order statistics. We further propose several techniques to analyze acquired clusters of

orders.

15.1

Introduction

The term order indicates a sequence of objects sorted according to some property. Such

orders are widely used as representational forms. For example, the responses from Web

search engines are lists of pages sorted according to their relevance to queries. Best-

seller lists, which are item-sequence sorted according to sales volume, are used on many

E-commerce sites.

Orders have also been exploited for sensory test of human respondents' sensations,

impressions, or preference. For such a kind of surveys, it is typical to adopt a scoring

method. In this method, a respondents' sensation is measured using a scale on which

extremes are represented by antonymous words. One example is a five-point-scale on

which 1 and 5 indicate don't prefer and prefer , respectively. If one very much prefers

an apple, he/she rates the apple as 5 . Though this scoring method is widely used, it

is not the best way for all types of sensory test. For example, as pointed out in [1],

a trained expert, e.g., a wine taster, can maintain a consistent mapping from his/her

sensation level to rating score throughout a given session. However, users' mappings

generally change for each response, especially if the intervals between responses are

long. Hence, even if two respondents rate the same item at the same score, their true

degrees of sensation may not be the same. When effects of such demerits cannot be

ignored, a ranking method is used. In this method, respondents show their degree of

sensation by orders, i.e., object sequences according to the degree of a target sensation.

Search WWH ::

Custom Search

Home