Database Reference
In-Depth Information
100
100
95
95
90
90
85
85
Recall (sampling)
Precision (sampling)
Recall (Poisson)
Precision (Poisson)
Recall (sampling)
Precision (sampling)
Recall (Poisson)
Precision (Poisson)
80
80
75
75
70
70
0.1
0.3
0.5
0.7
0.9
5
10
15
20
25
Expectation of membership probability
Average number of tuples in a rule
(a) Quality vs. probability.
(b) Quality vs. rule complexity.
100
100
95
95
90
90
85
Recall (sampling)
Precision (sampling)
Recall (Poisson)
Precision (Poisson)
Recall (sampling)
Precision (sampling)
Recall (Poisson)
Precision (Poisson)
80
85
75
80
70
200
400
600
800
1000
0.2
0.3
0.4
0.5
0.6
0.7
Parameter k
Probability threshold p
(c) Quality vs. k .
(d) Quality vs. probability threshold.
Fig. 5.6 The approximation quality of the sampling method and the Poisson approximation-based
method.
14
100
Chernoff bound
k=200
k=1000
12
99
10
98
8
97
6
96
4
Precision (k=200)
Recall (k=200)
95
2
0
94
500
1000
1500
2000
2500
500
1000
1500
2000
2500
Number of samples drawn
Number of samples drawn
(a) Error rate vs. sample size.
(b) Precision/recall vs. sample size.
Fig. 5.7 The approximation quality of the sampling-based method.
the lower ranked tuples to be ranked in the top- k lists in possible worlds. If the
membership probability of each tuple is very close to 1, then very likely we can
prune all the tuples after the first k tuples are scanned. In contrary, if the expectation
of the membership probability is low, then more tuples have a chance to be in the
top- k lists of some possible worlds. Consequently, the methods have to check more
tuples.
In Figure 5.4(b), when the rule complexity increases, more tuples are involved in
a rule. The average membership probability of those tuples decreases, and thus more
tuples need to be scanned to answer the query. In Figure 5.4(c), both the scan depth
and the answer set size increase linearly when k increases, which is intuitive. In
Figure 5.4(d), the size of the answer set decreases linearly as the probability thresh-
old p increases. However, the number of tuples scanned decreases much slower. As
 
Search WWH ::




Custom Search