Database Reference
In-Depth Information
We have applied the compression procedure to the data sets in Table 1,
and compared it with two simple techniques: equally spaced points and
randomly selected points. We have experimented with different compression
rates , which are defined as the percentage of points removed from a series.
For example, “eighty-percent compression” means that we select 20% of
points and discard the other 80%.
For each compression technique, we have measured the difference
between the original series and the compressed series. We have considered
three measures of difference between the original series,
a 1 ,...,a n ,
and the
series interpolated from the compressed version,
b 1 ,...,b n .
Mean difference: n · i =1 |a i − b i |.
Maximum difference: max i [1 ,...,n ] |a i − b i |.
Root mean square difference: n · i =1
(
a i − b i ) 2 .
We summarize the results in Table 2, which shows that important points
are significantly more accurate than the two simple methods.
4. Similarity Measures
We define similarity between time series, which underlies the retrieval pro-
cedure. We measure similarity on a zero-to-one scale; zero means no likeness
and one means perfect likeness. We review three basic measures of similar-
ity and then propose a new measure. First, we define similarity between
two numbers,
a
and
b
:
|a − b|
|a|
a, b
|b| .
sim(
)=1
+
The mean similarity between two series,
a 1 ,...,a n and
b 1 ,...,b n ,
is the
mean of their point-by-point similarity:
1
n ·
n
sim(
a i ,b i )
.
i =1
We also define the root mean square similarity:
n
1
n ·
sim(
a i ,b i ) 2 .
i =1
In addition, we consider the correlation coecient, which is a standard
statistical measure of similarity. It ranges from
1 to 1, but we can convert
 
Search WWH ::




Custom Search