Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data - page 123

Database Reference

In-Depth Information

RC

RC+AR

Sampling

RC+LR

RC

Sampling

RC+AR

RC +L R

60

50

50

40

40

30

30

20

20

10

10

0

0

20000

40000

60000

80000

100000

500

1000

1500

2000

2500

Number of tuples in the data set

Number of rules

(a) Runtime vs. cardinality.

(b) Runtime vs. number of rules.

Fig. 5.8 Scalability.

discussed in Section 5.2.3, a tuple t failing the probability threshold still has to be

retrieved if some tuples ranked lower than t may satisfy the threshold.

Figure 5.4 verifies the effectiveness of the pruning techniques discussed in Sec-

tion 5.2.3. With the pruning techniques, the exact algorithm only accesses a small

portion of the tuples in the data set. Interestingly, the average sample length is close

to the number of tuples scanned in the exact algorithm, which verifies the effective-

ness of our sampling techniques. Moreover, the exact algorithm and the sampling

algorithm access fewer tuples than the number computed by the general stopping

condition, while the number computed by the stopping condition is close to the real

stopping point, which shows the effectiveness of the stopping condition.

5.6.2.2 Efficiency and Approximation Quality

Figure 5.5 compares the runtime of the three versions of the exact algorithm and the

sampling algorithm with respect to the four aspects tested in Figure 5.4. The runtime

of the Poisson approximation based method is always less than one second, so we

omit it in Figure 5.5 for the sake of the readability of the figures. We also count the

number of times in the three versions of the exact algorithm that subset probability

values are computed. The trends are exactly the same as their runtime, therefore,

we omit the figures here. The results confirm that the rule-tuple compression tech-

nique and the reordering techniques speed up the exact algorithm substantially. Lazy

reordering always outperforms aggressive reordering substantially.

Compared to the exact algorithm, the sampling method is generally more stable

in runtime. Interestingly, the exact algorithm (Exact+LR) and the sampling algo-

rithm each has its edge. For example, when k is small, the exact algorithm is faster.

The sampling method is the winner when k is large. As k increases, more tuples

need to be scanned in the exact algorithm, and those tuples may be revisited in sub-

set probability computation. But the only overhead in the sampling method is to

scan more tuples when generating a sample unit, which is linear in k . This justifies

the need for both the exact algorithm and the sampling algorithm.

Next Page

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home