Information Technology Reference
In-Depth Information
d ρ ( O 1 ,O 2 )=1
ρ ( O 1 ,O 2 ) .
(15.4)
Since the range of ρ is [
1 , 1], this dissimilarity ranges [0 , 2]. This dissimilarity
becomes 0 if the two orders are concordant.
15.3
Methods
Here, we describe exiting clustering methods and our new clustering method.
15.3.1
Hierarchical Clustering Methods
In the literature of psychometrics, questionnaire data obtained by a ranking method have
been processed by traditional clustering techniques [4]. First, for all pairs of orders in
S , the dissimilarities in Section 15.2.1 are calculated, and a dissimilarity matrix for S is
obtained. Next, this matrix can be clustered by standard hierarchical clustering methods,
such as the group average method. In these survey researches, the size of the processed
data set is rather small ( N< 1000 ,L < 10, L i < 10). Therefore, hierarchical
clustering methods could cluster order sets, even though the time complexity of these
methods is O ( N 2 log( N )) under non-Euclidean metric [7] and is costly. However, these
method cannot be applied to a large-scale data, due to their computational cost.
Additionally, when the number of objects, L , is large, it is hard for respondents
to sort all objects in X . Therefore, sample orders are generally incomplete, i.e.,
X ( O i )
X , the dissimilarities cannot be calculated because the dissimilarity mea-
sures are defined between two orders consisting of the same objects. One way to deal
with incomplete orders is to introduce the notion of an Incomplete Order Set (IOS) 1 [4],
which is defined as a set of all possible complete orders that are concordant with the
given incomplete order. Given the incomplete order O that consists of the object set X ,
an IOS is defined as
O i |
O i
is concordant with O, X ( O i )= X }
ios( O )=
{
.
This idea is not fit for large-scale data sets because the size of the set is ( L ! /L !),which
grows exponentially in accordance with L . Additionally, there are some difficulties in
defining the distances between the two sets of orders. One possible definition is to adopt
the arithmetic mean of the distances between orders in each of the two sets. However,
this is not distance because d (ios a , ios a ) may not be 0. Therefore, more complicated
distance, i.e., Hausdorff distance, has to be adopted.
Since the above IOS cannot be derived for a large-scale data set, we adopted the
following heuristics in this paper. In such cases, the dissimilarity between the orders is
determined based on the the objects included in both. Take, for example, the following
two orders:
O 1 = x 1
x 3
x 4
x 6 ,
O 2 = x 5
x 4
x 3
x 2
x 6 .
1 In [4], this notion is referred by the term incomplete ranking , but we have adopted IOS to insist
that this is a set of orders.
 
Search WWH ::




Custom Search