Database Reference
In-Depth Information
)= |{d ∈ dists ( X,Y ) |d ≤ }|
|
Pr ( distance ( X,Y )
(7.4)
dists ( X,Y )
|
x 1
x 2
x n
x 1
x n
v 2
v 2
v 1
x 2
v 1
time
time
(a)
(b)
Figure 7.5. Example of an uncertain data series X = {x 1 , ..., x n } [27], modeled by
means of repeated observations (a), and pdf estimation (b).
Once we compute this probability, we can determine the result set of
PRQs similarity queries by filtering all uncertain sequences using Equa-
tion 7.4. Note that the naive computation of the result set is infeasible,
because of the exponential computational cost,
= s X s X ,
where s X ,s Y are the number of samples at each timestamp of X,Y ,
respectively, and n is the length of the sequences. Eciency can be en-
sured by upper and lower bounding the distances, and summarizing the
repeated samples using minimal bounding intervals [8]. This framework
has been applied to Euclidean and DTW distances and guarantees no
false dismissals in the original space.
PROUD: In [105], an approach for processing queries over PROb-
abilistic Uncertain Data streams (PROUD) is presented. Inspired by
the Euclidean distance, the PROUD distance is modeled as the sum
of the differences of the streaming data series random variables, where
each random variable represents the uncertainty of the value in the cor-
responding timestamp (see Figure 7.5(b) ). Given two uncertain data
series X,Y , their distance is defined as:
distance ( X,Y )=
i
|
dists ( X,Y )
|
D i 2
(7.5)
y i ) are random variables, as shown in Figure 7.6 .
According to the central limit theorem, we have that the cumulative
distribution of the distances approaches a normal distribution, and the
normalized distance follows a standard normal distribution. Therefore,
we can obtain the normal distribution of the original distance as follows:
where D i =( x i
N (
i
E [ D i ] ,
i
Var [ D i ])
distance ( X,Y )
(7.6)
 
Search WWH ::




Custom Search