Database Reference
In-Depth Information
y n
y 2
y 1
x 1
D i
x n
x 2
v 2
v 1
time
Figure 7.6. The probabilistic distance model [27].
The interesting result here is that, regardless of the data distribution
of the random variables composing the uncertain data series, the cu-
mulative distribution of their distances (1) is defined similarly to their
euclidean distance and (2) approaches a normal distribution. Recall that
we want to answer PRQs similarity queries. First, given a probability
threshold τ and the Cumulative Distribution Function (CDF) of the
normal distribution, we compute limit such that:
Pr ( distance ( X,Y ) norm
limit )
τ
(7.7)
The CDF of the normal distribution can be formulated in terms of the
well known error-function ,and limit can be determined by looking up
the statistics tables. Once we have limit , we proceed by computing also
the normalized norm . Then, if a candidate uncertain series Y satisfies
the inequality norm ( X,Y )
limit , the following equation holds:
Pr ( distance ( X,Y ) norm
norm ( X,Y ))
τ
(7.8)
Therefore, Y can be added to the result set. Otherwise, it is pruned
away. This distance formulation is statistically sound and only requires
knowledge of the general characteristics of the data distribution, namely,
its mean and variance.
DUST: In [83], the authors propose a new distance measure, DUST,
that compared to MUNICH, does not depend on the existence of mul-
tiple observations and is computationally more ecient. Similarly to
[105], DUST is inspired by the Euclidean distance, but works under the
assumption that all the data series values follow some specific distribu-
tion. Given two uncertain data series X,Y , the distance between two
uncertain values x i ,y i is defined as the distance between their true (un-
known) values r ( x i ) ,r ( y i ): dist ( x i ,y i )= L 1 ( r ( x i ) ,r ( y i )). This distance
 
Search WWH ::




Custom Search