Information Technology Reference
In-Depth Information
15
Efficient Clustering for Orders
Toshihiro Kamishima and Shotaro Akaho
National Institute of Advanced Industrial Science and Technology (AIST),
AIST Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki, 305-8568 Japan
mail@kamishima.net
http://www.kamishima.net/
s.akaho@aist.go.jp
Abstract. Lists of ordered objects are widely used as representational forms. Such ordered ob-
jects include Web search results or best-seller lists. Clustering is a useful data analysis technique
for grouping mutually similar objects. To cluster orders, hierarchical clustering methods have
been used together with dissimilarities defined between pairs of orders. However, hierarchical
clustering methods cannot be applied to large-scale data due to their computational cost in terms
of the number of orders. To avoid this problem, we developed an k -o'means algorithm. This al-
gorithm successfully extracted grouping structures in orders, and was computationally efficient
with respect to the number of orders. However, it was not efficient in cases where there are too
many possible objects yet. We therefore propose a new method ( k -o'means-EBC), grounded on
a theory of order statistics. We further propose several techniques to analyze acquired clusters of
orders.
15.1
Introduction
The term order indicates a sequence of objects sorted according to some property. Such
orders are widely used as representational forms. For example, the responses from Web
search engines are lists of pages sorted according to their relevance to queries. Best-
seller lists, which are item-sequence sorted according to sales volume, are used on many
E-commerce sites.
Orders have also been exploited for sensory test of human respondents' sensations,
impressions, or preference. For such a kind of surveys, it is typical to adopt a scoring
method. In this method, a respondents' sensation is measured using a scale on which
extremes are represented by antonymous words. One example is a five-point-scale on
which 1 and 5 indicate don't prefer and prefer , respectively. If one very much prefers
an apple, he/she rates the apple as 5 . Though this scoring method is widely used, it
is not the best way for all types of sensory test. For example, as pointed out in [1],
a trained expert, e.g., a wine taster, can maintain a consistent mapping from his/her
sensation level to rating score throughout a given session. However, users' mappings
generally change for each response, especially if the intervals between responses are
long. Hence, even if two respondents rate the same item at the same score, their true
degrees of sensation may not be the same. When effects of such demerits cannot be
ignored, a ranking method is used. In this method, respondents show their degree of
sensation by orders, i.e., object sequences according to the degree of a target sensation.
 
Search WWH ::




Custom Search