Database Reference
In-Depth Information
800
1400
Stopping condition
Avg sample length
Exact algo
Ans
w
er set
700
1200
600
1000
500
800
400
S
to
pping
co
ndition
Exact algo
Avg sample length
Answer set
600
300
400
200
200
100
0
0
0.1
0.3
0.5
0.7
0.9
5
10
15
20
25
Expectation of membership probability
Average number of tuples in a rule
(a) Scan depth vs. probability.
(b) Scan depth vs. rule complexity.
4800
800
Stopping condition
Exact algo
Avg sample length
Ans
w
er set
700
4000
600
3200
500
2400
400
S
topping condition
E
xact
a
lgo
Avg sample length
Answer set
300
1600
200
800
100
0
0
200
400
600
800
1000
0.2
0.3
0.4
0.5
0.6
0.7
Parameter k
Probability threshold p
(c) Scan depth vs.
k
.
(d) Scan depth vs. probability threshold.
Fig. 5.4
Scan depth (each test data set contains 20
,
000 tuples and 2
,
000 generation rules).
By varying
p
in a top-
query, we can see the tradeoff between the confidence
and the highest ranks a tuple can get with the confidence.
If
k
(
p
,
l
)
=
10 and
p
=
0
.
4, then a PT-
k
query returns
{
R
1
,
R
2
,
R
3
,
R
4
,
R
5
,
R
6
,
R
9
,
R
10
,
R
11
,
R
14
,
R
15
}
.If
p
is increased to 0
.
7, then the answer set is
{
R
1
,
R
2
,
R
3
,
R
5
,
R
6
,
R
9
,
R
10
,
R
11
}
.
5.6.2 Results on Synthetic Data Sets
To evaluate the query answering quality and the scalability of our algorithms, we
generate various synthetic data sets. The membership probability values of in-
dependent tuples and multi-tuple generation rules follow the normal distribution
N
, respectively. The rule complexity, i.e., the number of
tuples involved in a rule, follows the normal distribution
N
(μ
P
t
,
σ
P
t
)
and
N
(μ
P
R
,
σ
P
R
)
2
.
(μ
|
R
|
,
σ
|
R
|
)
By default, a synthetic data set contains 20
,
000 tuples and 2
,
000 multi-tuple
generation rules: 2
000 generation rules. The number of tuples involved in each
multi-tuple generation rule follows the normal distribution
N
,
. The probability
values of independent tuples and multi-tuple generation rules follow the normal dis-
tribution
N
(
5
,
2
)
(
0
.
5
,
0
.
2
)
and
N
(
0
.
7
,
0
.
2
)
, respectively. We test the probability threshold
top-
k
queries with
k
=
200 and
p
=
0
.
3.
2
The
data
generator
is
available
at
http://www.cs.sfu.ca/
˜
jpei/Software/
PTKLib.rar