Database Reference
In-Depth Information
)
1 b x
Pr
(
G
(
v i ) ,
a
Pr
(
G
(
v j ) ,
b
)
(7.8)
1
a x
Theorem 7.6 (Pruning rule). Given a PME-graph G and a probabilistic threshold
top- k query Q P , f
with
probability
threshold p
(
0
,
1
]
,
if
for
vertex v ,
p , then for any vertex v ( v
f v ), Pr k
v ) <
1 a k 1 Pr
(
G
(
v i ) ,
a
) <
(
p .
Theorem 7.6 states that, once the subgraph probability for the dominant subgraph
of a vertex v fails the probability threshold, all vertices ranked lower than v can be
pruned.
7.6 Extensions to Aggregate Queries
Interestingly, the above techniques for ranking queries on uncertain data can be
used to answer aggregate queries. In this section, we discuss the aggregate query
answering on probabilistic linkages.
7.6.1 Aggregate Queries on Probabilistic Linkages
Example 7.7 (Aggregate queries). Consider Example 2.12 again. If a medical expert
is interested in counting the number of linkages between the hospitalization registers
and the causes-of-death registers. The answer to Q directly corresponds to the death
population after hospitalization.
Suppose
45 . No records in Table 2.4 are considered
matched, since the linkage probabilities are all lower than
δ M =
0
.
9 and
δ U =
0
.
δ U . Is the answer to
Q simply 0 ?
Record a 1 in the hospitalization register data set is linked to three records in the
causes-of-death register data set, namely b 1 ,b 2 and b 3 , with linkage probability 0
.
3 ,
0
4 , respectively. Therefore, the probability that “John H. Smith” is linked
to some records in the causes-of-death register data set and thus reported dead is
0
.
3 and 0
.
.
3
+
0
.
3
+
0
.
4
=
1 . Similarly, the probability that “Johnson R. Smith” is reported
dead is 1 .
In a possible world, the answer to an aggregate query is certain. Therefore, the
answer to an aggregate query is a multiset of the answers in the possible worlds.
Incorporating the probabilities, the answer to an aggregate query is a probability
distribution on possible answers.
Definition 7.8 (Aggregate query on linkages). Given a set
L A , B of linkages be-
tween tables A and B , let Q F be an aggregate query , where P is a predicate
which may involve attributes in A , B , or both and F is an aggregate function
on attribute
. The answer to Q F on linkages is the probability distribution
A
 
Search WWH ::




Custom Search