Information Technology Reference
In-Depth Information
correlation between ranks of objects. The ρ between two orders, O 1 and O 2 , consisting
of the same objects (i.e., X
X ( O 1 )= X ( O 2 ))isdefinedas:
x j ∈X r 1 j
r 1 r 2 j
r 2
x j ∈X r 1 j
r 1 2 x j ∈X r 2 j
ρ =
r 2 2 ,
where r i =(1 /L ) x j ∈X r ij , L =
|
X
|
. If no tie in rank is allowed, this can be calculated
by the simple formula:
6 x j ∈X r 1 j
r 2 j 2
ρ =1
.
(15.1)
L 3
L
The ρ becomes 1 if the two orders are concordant, and
1 if one order is the reverse of
the other order. Observing Equation (15.1), this similarity depends only on the term
d S ( O 1 ,O 2 )=
x j ∈X
r 2 j ) 2 .
( r 1 j
(15.2)
This is called Spearman's distance . If two or more objects are tied, we give the same
midrank to these objects [4]. For example, consider an order x 5
” denotes
a tie in rank), in which x 2 and x 3 are ranked at the 2nd or 3rd positions. In this case, the
midrank 2 . 5 is assigned to both objects.
Another widely used measure of the similarity of orders is Kendall's τ . Intuitively,
this is defined as the number of concordant object pairs subtracted by that of discordant
pairs, and then it is normalized. Formally, Kendall's τ is defined as
x 2
x 3 (“
sgn ( r 1 a
r 2 b ) ,
1
τ =
r 1 b )( r 2 a
(15.3)
L ( L
1) / 2
x a x b Pair( O 1 )
where sgn( x ) is a sign function that takes 1 if x> 0, 0 if x =0,and
1 otherwise. Many
other types of similarities between orders have been proposed (see [4, chapter 2]), but
the above two are widely used and have been well studied.
In this paper, we adopt Spearman's ρ rather than Kendall's τ because of the following
reasons: First, these two measures have similar properties. Both measures of similarities
between two random orders asymptotically follow normal distribution as the length
of the orders grows. Additionally, these are highly correlated, because the difference
between the two measures is bounded by Daniels' inequality [9]:
3( L +2)
L
2( L +1)
L
1
τ
ρ
1 .
2
2
Second, Spearman's ρ can be calculated more quickly. All of the object pairs have to
be checked to derive Kendall's τ ,so O ( L 2 ) time is required. In the case of Spear-
man's ρ , the most time consuming task is sorting objects to decide their ranks; thus, the
time complexity is O ( L log L ). Further, the central orders under Spearman distance is
tractable, but the derivation under Kendall's distance is NP-hard [10].
For the clustering task, distance or dissimilarity is more useful than similarity. We
defined a dissimilarity between two orders based on ρ :
 
Search WWH ::




Custom Search