Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data - page 121

Database Reference

In-Depth Information

180

70

RC

RC+AR

Sampling

RC+LR

RC

RC+AR

Sampling

RC+LR

160

60

140

120

50

100

40

80

30

60

20

40

10

20

0

0

0.1

0.3

0.5

0.7

0.9

5

10

15

20

25

Expectation of membership probability

Average number of tuples in a rule

(a) Runtime vs. probability.

(b) Runtime vs. rule complexity.

45

1200

RC

RC+AR

RC+LR

Sampling

RC

Sampling

RC+AR

RC+LR

40

1000

35

30

800

25

600

20

15

400

10

200

5

0

0

200

400

600

800

1000

0.2

0.3

0.4

0.5

0.6

0.7

Parameter k

Probability threshold p

(c) Runtime vs. k .

(d) Runtime vs. probability threshold.

Fig. 5.5 Efficiency (same settings as in Figure 5.4).

Since ranking queries are extensively supported by modern database manage-

ment systems, we treat the generation of a ranked list of tuples as a black box, and

test our algorithms on top of the ranked list.

First, to evaluate the efficient top- k probability computation techniques, we com-

pare the exact algorithm, the sampling method, and the Poisson approximation

based method for evaluating PT- k queries. The experimental results for top-

(

k

,

l

)

(

,

)

queries and top-

queries are similar to the results for PT- k queries. Therefore,

we omit the details. For the exact algorithm, we compare three versions: Exact (us-

ing rule-tuple compression and pruning techniques only), Exact+AR (using aggres-

sive reordering), and Exact+LR (using lazy reordering). The sampling method uses

the two improvements described in Section 5.3. Then, we evaluate the online query

answering techniques for PT- k queries, top-

p

l

(

k

,

l

)

queries and top-

(

p

,

l

)

queries.

5.6.2.1 Scan Depth

We test the number of tuples scanned by the methods (Figure 5.4). We count the

number of distinct tuples read by the exact algorithm and the sample length as the

average number of tuples read by the sampling algorithm to generate a sample unit.

For reference, we also plot the number of tuples in the answer set, i.e., the tuples sat-

isfying the probability threshold top- k queries, and the number of tuples computed

by the general stopping condition discussed in Section 5.4.

In Figure 5.4(a), when the expected membership probability is high, the tuples

at the beginning of the ranked list likely appear, which reduce the probabilities of

Next Page

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home