Database Reference
In-Depth Information
…
y
n
y
2
y
1
…
x
1
D
i
x
n
x
2
v
2
v
1
time
Figure 7.6.
The probabilistic distance model [27].
The interesting result here is that, regardless of the data distribution
of the random variables composing the uncertain data series, the cu-
mulative distribution of their distances (1) is defined similarly to their
euclidean distance and (2) approaches a normal distribution. Recall that
we want to answer PRQs similarity queries. First, given a probability
threshold
τ
and the Cumulative Distribution Function (CDF) of the
normal distribution, we compute
limit
such that:
Pr
(
distance
(
X,Y
)
norm
≤
limit
)
≥
τ
(7.7)
The CDF of the normal distribution can be formulated in terms of the
well known
error-function
,and
limit
can be determined by looking up
the statistics tables. Once we have
limit
, we proceed by computing also
the normalized
norm
. Then, if a candidate uncertain series
Y
satisfies
the inequality
norm
(
X,Y
)
≥
limit
, the following equation holds:
Pr
(
distance
(
X,Y
)
norm
≤
norm
(
X,Y
))
≥
τ
(7.8)
Therefore,
Y
can be added to the result set. Otherwise, it is pruned
away. This distance formulation is statistically sound and only requires
knowledge of the general characteristics of the data distribution, namely,
its mean and variance.
DUST:
In [83], the authors propose a new distance measure, DUST,
that compared to MUNICH, does not depend on the existence of mul-
tiple observations and is computationally more ecient. Similarly to
[105], DUST is inspired by the Euclidean distance, but works under the
assumption that all the data series values follow some specific distribu-
tion. Given two uncertain data series
X,Y
, the distance between two
uncertain values
x
i
,y
i
is defined as the distance between their true (un-
known) values
r
(
x
i
)
,r
(
y
i
):
dist
(
x
i
,y
i
)=
L
1
(
r
(
x
i
)
,r
(
y
i
)). This distance