Database Reference
In-Depth Information
psim ( a , b )
Peak similarity of numbers a and b ,where |a|≥|b|.
Fig. 5.
it to the zero-to-one scale by adding one and dividing by two. For series
a 1 ,...,a n
b 1 ,...,b n ,
with mean values
m a
a 1 +
a n )
m b =(
b 1 +
b n )
the correlation coecient is
i =1
a i − m a )
b i − m b )
i =1
a i − m a ) 2 · i =1
b i − m b ) 2
We next define a new similarity measure, called the peak similarity. For
, their peak similarity is
|a − b|
) .
In Figure 5, we show the meaning of this definition for
a, b
|a|, |b|
based on
the illustrated triangle. We draw the vertical line through
to the intersec-
tion with the triangle's side; the ordinate of the intersection is the similarity
. The peak similarity of two series is the mean similarity of their
points (psim(
We next give an empirical comparison of the four similarity measures.
For each series, we have found the five most similar series, and then deter-
mined the mean difference between the given series and the other five.
In Table 3, we summarize the results and compare them with the perfect
exhaustive-search selection and with random selection. The results show
that the peak similarity performs better than the other measures, and that
the correlation coecient is the least effective.
We have also used the four similarity measures to identify close matches
for each series, and compared the results with ground-truth neighborhoods.
For stocks, we have considered small neighborhoods formed by industry sub-
groups, as well as large neighborhoods formed by industry groups, accord-
ing to Standard and Poor's classification. For air and sea temperatures, we
have used geographic proximity to define two ground-truth neighborhoods.
The first neighborhood is the 1
a 1 ,b 1 )+
a n ,b n ))
5 rectangle in the grid of buoys, and
the second is the 3
5 rectangle. For wind speeds, we have also used geo-
graphic proximity; the first neighborhood includes all sites within 70 miles,
Search WWH ::

Custom Search