Database Reference
In-Depth Information
described in Algorithm 7.1.
Reuse
denotes the algorithm with the reusing technique
and
Reuse+pruning
denotes the algorithm with both the reusing technique and the
pruning technique.
Figures 7.12(a) to (d) show the runtime of the query answering methods for
count
queries in various parameter settings.
Exact-count
is the exact algorithm.
Equi-width
and
Equi-depth
denotes the algorithms using the approximation tech-
niques discussed in Sections 7.6.2.1 and 7.6.2.2, respectively. By default,
10
−
4
ε =
ρ =
and
30.
In Figure 7.12(a), we fix the number of tuples to 5
000 and the expected degree
of each tuple is 4. The runtime of all three algorithms increase mildly as the num-
ber of linkage increases from 20
,
,
000 to 100
,
000, thanks to the vertex compression
technique discussed in Section 7.3.3.
In Figure 7.12(b), we fix the expected degree of each tuple to 4 and increase
the number of tuples in the data set. The runtime of the
exact-count
algorithm in-
creases quadratically. The runtime of the two approximation algorithms increases
very slightly.
Figure 7.12(c) shows the increase of runtime with respect to the degrees of tuples.
The larger the degree, the more times of convolution are performed in the query
evaluation. Therefore, the runtime of all three algorithms increases linearly with
respect to the expected degree.
Last, Figure 7.12(d) illustrates the change of runtime when the number of con-
nected components increases.
Figures 7.13(a) to (d) show the runtime of the
sum
query evaluation in the same
setting as in Figure 7.12. The results on the
sum
query evaluation demonstrate the
similar patterns as in the
count
query evaluation. But the overall cost for the
sum
query evaluation is higher than that of the
count
query evaluation, since it depends
on the number of values appearing in the result distribution.
Figures 7.14(a) to (d) show the efficiency of the
min
query evaluation.
Exact-
min
denotes the algorithm that transforms the
min
query to a set of
count
queries.
Reuse
denotes the algorithm that explores the sharing of computation among differ-
ent linkages.
Reuse+Pruning
denotes the algorithm that applies the pruning tech-
nique discussed in Section7.5.3 in addition to the reuse method. It is shown that the
two techniques improve the efficiency significantly.
Last, we evaluate the quality of the two approximation algorithms discussed in
Sections 7.6.2.1 and 7.6.2.2, respectively.
The quality of
ε
-approximate euqi-width histogram answers are computed as
−
|
f
(φ )
−
f
(φ )
|
f
(φ )
1
|{
φ
|
f
(φ )
>
τ
}|
∑
f
(φ )
>
τ
1
, where
f
(φ )
is the probability of interval
φ
and
f
(φ )
∈
φ
estimated by the approximation method. The results
are shown in Figure 7.15(a), with respect to different
is the probability of
x
values 10
−
x
(
x
ε
=
,
,
,
,
3
4
5
6
7).
The precision and recall are also plotted.
Figure 7.15(b) illustrates the approximation quality of the answers to both
count
and
sum
queries using the
ρ
quantiles. The quality is measured using
−
|
f
(
v
i
)
−
f
(
v
i
)
|
f
1
k
i
1
1
, where
v
i
is the value output as the approximation of the
i
-
∑
=
(
v
i
)