Ranking Queries on Probabilistic Linkages - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

described in Algorithm 7.1. Reuse denotes the algorithm with the reusing technique

and Reuse+pruning denotes the algorithm with both the reusing technique and the

pruning technique.

Figures 7.12(a) to (d) show the runtime of the query answering methods for

count queries in various parameter settings. Exact-count is the exact algorithm.

Equi-width and Equi-depth denotes the algorithms using the approximation tech-

niques discussed in Sections 7.6.2.1 and 7.6.2.2, respectively. By default,

10 − 4

ε =

ρ =

and

30.

In Figure 7.12(a), we fix the number of tuples to 5

000 and the expected degree

of each tuple is 4. The runtime of all three algorithms increase mildly as the num-

ber of linkage increases from 20

000 to 100

000, thanks to the vertex compression

technique discussed in Section 7.3.3.

In Figure 7.12(b), we fix the expected degree of each tuple to 4 and increase

the number of tuples in the data set. The runtime of the exact-count algorithm in-

creases quadratically. The runtime of the two approximation algorithms increases

very slightly.

Figure 7.12(c) shows the increase of runtime with respect to the degrees of tuples.

The larger the degree, the more times of convolution are performed in the query

evaluation. Therefore, the runtime of all three algorithms increases linearly with

respect to the expected degree.

Last, Figure 7.12(d) illustrates the change of runtime when the number of con-

nected components increases.

Figures 7.13(a) to (d) show the runtime of the sum query evaluation in the same

setting as in Figure 7.12. The results on the sum query evaluation demonstrate the

similar patterns as in the count query evaluation. But the overall cost for the sum

query evaluation is higher than that of the count query evaluation, since it depends

on the number of values appearing in the result distribution.

Figures 7.14(a) to (d) show the efficiency of the min query evaluation. Exact-

min denotes the algorithm that transforms the min query to a set of count queries.

Reuse denotes the algorithm that explores the sharing of computation among differ-

ent linkages. Reuse+Pruning denotes the algorithm that applies the pruning tech-

nique discussed in Section7.5.3 in addition to the reuse method. It is shown that the

two techniques improve the efficiency significantly.

Last, we evaluate the quality of the two approximation algorithms discussed in

Sections 7.6.2.1 and 7.6.2.2, respectively.

The quality of

-approximate euqi-width histogram answers are computed as

− | f (φ ) − f (φ ) |

f (φ )

|{ φ | f (φ ) > τ }| ∑ f (φ ) > τ

, where f

(φ )

is the probability of interval

and

(φ )

∈ φ

estimated by the approximation method. The results

are shown in Figure 7.15(a), with respect to different

is the probability of x

values 10 − x ( x

7).

The precision and recall are also plotted.

Figure 7.15(b) illustrates the approximation quality of the answers to both

count and sum queries using the

quantiles. The quality is measured using

− | f ( v i ) − f ( v i ) |

1 1

, where v i is the value output as the approximation of the i -

∑

(

v i )

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home