Database Reference
In-Depth Information
Window ID PTK query ( k = 5, p = 0 . 4) Top-5 query on average data
W 1
O 2 ,
O 6 ,
O 16
O 2 ,
O 4 ,
O 6 ,
O 14 ,
O 16
O 2 , O 6 , O 16
O 2 , O 4 , O 6 , O 14 , O 16
W 2
W 3
O 2
,
O 6 ,
O 16
O 2
,
O 4
,
O 6 ,
O 14
,
O 16
O 2 , O 4 , O 6 , O 16
O 2 , O 4 , O 6 , O 14 , O 16
W 4
W 5
O 2
,
O 6
,
O 16
O 2
,
O 4
,
O 6
,
O 14
,
O 16
W 6
O 2 , O 16
O 2 , O 4 , O 6 , O 14 , O 16
W 7
O 2
,
O 16
O 2
,
O 4
,
O 6
,
O 14
,
O 16
W 8
O 2 ,
O 16
O 2 ,
O 4 ,
O 6 ,
O 14 ,
O 16
W 9
O 2
,
O 16
O 2
,
O 4
,
O 6
,
O 14
,
O 16
O 16
Table 6.1 The answers to a probabilistic threshold top- k query in 10 consecutive sliding windows
( ω = 60) and the answers to a traditional top- k query on average values.
W 10
O 2 ,
O 16
O 2 ,
O 4 ,
O 6 ,
O 14 ,
first 5 sliding windows. The rankings of seismic values reported by sensor O 6 drop
after sliding window W 5 . Third, the seismic values reported by sensor O 4 is ranked
among top-5 with high confidence only in sliding window W 4 .
A traditional top-5 query on the average seismic values in each sliding window
reports
consistently in the 10 sliding windows. They do not
reflect the above interesting patterns.
Simple example shows that continuous probabilistic threshold top- k queries on
uncertain streams provide meaningful information which cannot be captured by the
traditional top- k queries on aggregate data.
{
O 2 ,
O 4 ,
O 6 ,
O 14 ,
O 16 }
6.4.2 Synthetic Data Sets
In this performance study, we use various synthetic data sets to evaluate the effi-
ciency of the algorithms and the query evaluation quality. By default, a data set con-
tains 100 uncertain streams, and the sliding window width is set to
ω =
200. Thus,
there are 20
000 instances in each sliding window. The data in a sliding window
is already held in main memory. The scores of instances from one stream follow a
normal distribution. The mean
,
μ
is randomly picked from a domain
[
0
,
1000
]
, and
the variance
σ
is randomly picked from
[
0
,
10
]
. We add 10% noise by using 10
σ
as
the variance. Moreover, the query parameter k
=
20, and the probability threshold
p
=
0
.
4. The number of samples drawn in the sampling algorithm is 1
,
000. In quan-
tile summaries,
φ =
0
.
1, and
ε =
0
.
02. The reported results are the average values
in 5 sliding windows.
We test the following four algorithms: the deterministic exact method ( Det )in
Section 6.1.2; the sampling method ( Sam ) in Section 6.2; the extended deterministic
method using quantile summaries ( Det-Q ) and the sampling method using quantile
summaries ( Sam-Q ) in Section 6.3.3.
 
Search WWH ::




Custom Search