Database Reference
In-Depth Information
800
1400
Stopping condition
Avg sample length
Exact algo
Ans w er set
700
1200
600
1000
500
800
400
S to pping co ndition
Exact algo
Avg sample length
Answer set
600
300
400
200
200
100
0
0
0.1
0.3
0.5
0.7
0.9
5
10
15
20
25
Expectation of membership probability
Average number of tuples in a rule
(a) Scan depth vs. probability.
(b) Scan depth vs. rule complexity.
4800
800
Stopping condition
Exact algo
Avg sample length
Ans w er set
700
4000
600
3200
500
2400
400
S topping condition
E xact a lgo
Avg sample length
Answer set
300
1600
200
800
100
0
0
200
400
600
800
1000
0.2
0.3
0.4
0.5
0.6
0.7
Parameter k
Probability threshold p
(c) Scan depth vs. k .
(d) Scan depth vs. probability threshold.
Fig. 5.4 Scan depth (each test data set contains 20
,
000 tuples and 2
,
000 generation rules).
By varying p in a top-
query, we can see the tradeoff between the confidence
and the highest ranks a tuple can get with the confidence.
If k
(
p
,
l
)
=
10 and p
=
0
.
4, then a PT- k query returns
{
R 1
,
R 2
,
R 3
,
R 4
,
R 5
,
R 6
,
R 9
,
R 10
,
R 11
,
R 14
,
R 15
}
.If p is increased to 0
.
7, then the answer set is
{
R 1
,
R 2
,
R 3
,
R 5
,
R 6
,
R 9
,
R 10
,
R 11
}
.
5.6.2 Results on Synthetic Data Sets
To evaluate the query answering quality and the scalability of our algorithms, we
generate various synthetic data sets. The membership probability values of in-
dependent tuples and multi-tuple generation rules follow the normal distribution
N
, respectively. The rule complexity, i.e., the number of
tuples involved in a rule, follows the normal distribution N
P t , σ P t )
and N
P R , σ P R )
2 .
| R | , σ | R | )
By default, a synthetic data set contains 20
,
000 tuples and 2
,
000 multi-tuple
generation rules: 2
000 generation rules. The number of tuples involved in each
multi-tuple generation rule follows the normal distribution N
,
. The probability
values of independent tuples and multi-tuple generation rules follow the normal dis-
tribution N
(
5
,
2
)
(
0
.
5
,
0
.
2
)
and N
(
0
.
7
,
0
.
2
)
, respectively. We test the probability threshold
top- k queries with k
=
200 and p
=
0
.
3.
2
The
data
generator
is
available
at http://www.cs.sfu.ca/ ˜ jpei/Software/
PTKLib.rar
 
Search WWH ::




Custom Search