Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

4.1.2.4 Rule 4: Negation

If the query is

Q then:

Negation

P( ¬ Q) = 1 − P (Q)

(4.6)

4.1.2.5 Rule 5: Inclusion-Exclusion Formula

Suppose a query can be written as Q 1 ∧ Q 2 ∧ ... ∧ Q k . Then its probability is:

1 ) | s | P(

i ∈[ s ]

P(Q 1 ∧ Q 2 ∧ ... ∧ Q k ) =−

( −

inclusion-exclusion

Q i )

(4.7)

s ⊆[ k ] ,s =∅

This is the dual of the popular inclusion-exclusion formula since it expresses the probability

of an intersection of events in terms of the probability of unions of events. For example, when k = 3,

then the two inclusion-exclusion formulas are:

P(Q 1 ∧

Q 2 ∧

Q 3 )

P(Q 3 )

− P(Q 1 ∨ Q 2 ) − P(Q 1 ∨ Q 3 ) − P(Q 2 ∨ Q 3 )

P(Q 1 )

P(Q 2 )

Q 3 )

P(Q 1 ∨ Q 2 ∨ Q 3 ) = P(Q 1 ) + P(Q 2 ) + P(Q 3 )

− P(Q 1 ∧ Q 2 ) − P(Q 1 ∧ Q 3 ) − P(Q 2 ∧ Q 3 )

+ P(Q 1 ∧ Q 2 ∧ Q 3 )

P(Q 1 ∨

Q 2 ∨

We use only the former because it brings the query expression in a form in which we can

apply Eq. (4.3) , then find a separator variable.

4.1.2.6 Rule 6: Attribute Ranking

This rule allows us to partition the tuples of a relation instance R , according to a simple predicate.

We consider two such predicates, leading to two flavors of the attribute ranking rule, which we

call Attribute-Constant Ranking and Attribute-Attribute Ranking . The term “ranking” refers to the

fact that we impose a partial order on the attributes; it should not be confused with tuple-ranking

discussed in Section 6.1 .

Attribute-attribute ranking is the following. Let A, B be two attributes of R . Partition R

into three relations, R 1 = σ A<B (R), R 2 = σ A = B (R), R 3 = σ A>B (R) , and substitute in the query

Q all atoms referring to R with R 1 ∨

R 3 . We denote Q r

the resulting query and say that Q r

R 2 ∨

is obtained from Q by “ranking” R on the attributes A, B .

Attribute-constant ranking is the following. Let A be an attribute of R and a be a constant 1

such that there exists two unifiable atoms using the relation symbol R , such that the first has the

constant a in position A and the other has a variable in position A . Then we partition R into two

1 If Q is a non-Boolean query, then we also do ranking w.r.t. to head variables a . In other words, for the purpose of ranking, head

variables are treated as constants.

Probabilistic Databases

Search WWH ::

Custom Search

Home