Continuous Ranking Queries on Uncertain Streams - Ranking Queries on Uncertain Data - page 146

Database Reference

In-Depth Information

Window ID PTK query ( k = 5, p = 0 . 4) Top-5 query on average data

W 1

O 2 ,

O 6 ,

O 16

O 2 ,

O 4 ,

O 6 ,

O 14 ,

O 16

O 2 , O 6 , O 16

O 2 , O 4 , O 6 , O 14 , O 16

W 2

W 3

O 2

,

O 6 ,

O 16

O 2

,

O 4

,

O 6 ,

O 14

,

O 16

O 2 , O 4 , O 6 , O 16

O 2 , O 4 , O 6 , O 14 , O 16

W 4

W 5

O 2

,

O 6

,

O 16

O 2

,

O 4

,

O 6

,

O 14

,

O 16

W 6

O 2 , O 16

O 2 , O 4 , O 6 , O 14 , O 16

W 7

O 2

,

O 16

O 2

,

O 4

,

O 6

,

O 14

,

O 16

W 8

O 2 ,

O 16

O 2 ,

O 4 ,

O 6 ,

O 14 ,

O 16

W 9

O 2

,

O 16

O 2

,

O 4

,

O 6

,

O 14

,

O 16

O 16

Table 6.1 The answers to a probabilistic threshold top- k query in 10 consecutive sliding windows

( ω = 60) and the answers to a traditional top- k query on average values.

W 10

O 2 ,

O 16

O 2 ,

O 4 ,

O 6 ,

O 14 ,

first 5 sliding windows. The rankings of seismic values reported by sensor O 6 drop

after sliding window W 5 . Third, the seismic values reported by sensor O 4 is ranked

among top-5 with high confidence only in sliding window W 4 .

A traditional top-5 query on the average seismic values in each sliding window

reports

consistently in the 10 sliding windows. They do not

reflect the above interesting patterns.

Simple example shows that continuous probabilistic threshold top- k queries on

uncertain streams provide meaningful information which cannot be captured by the

traditional top- k queries on aggregate data.

{

O 2 ,

O 4 ,

O 6 ,

O 14 ,

O 16 }

6.4.2 Synthetic Data Sets

In this performance study, we use various synthetic data sets to evaluate the effi-

ciency of the algorithms and the query evaluation quality. By default, a data set con-

tains 100 uncertain streams, and the sliding window width is set to

ω =

200. Thus,

there are 20

000 instances in each sliding window. The data in a sliding window

is already held in main memory. The scores of instances from one stream follow a

normal distribution. The mean

,

μ

is randomly picked from a domain

[

0

,

1000

]

, and

the variance

σ

is randomly picked from

[

0

,

10

]

. We add 10% noise by using 10

σ

as

the variance. Moreover, the query parameter k

=

20, and the probability threshold

p

=

0

.

4. The number of samples drawn in the sampling algorithm is 1

,

000. In quan-

tile summaries,

φ =

0

.

1, and

ε =

0

.

02. The reported results are the average values

in 5 sliding windows.

We test the following four algorithms: the deterministic exact method ( Det )in

Section 6.1.2; the sampling method ( Sam ) in Section 6.2; the extended deterministic

method using quantile summaries ( Det-Q ) and the sampling method using quantile

summaries ( Sam-Q ) in Section 6.3.3.

Next Page

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home