Advanced Techniques - Probabilistic Databases - page 125

Database Reference

In-Depth Information

U c

L c

t 2

t 1

t 3

t 4

t 5

p

0

1

Example

Empty critical region

L c

U c

L c

U c

L c

U c

Case 1

Case 2

Case 3

Figure 6.1: Illustration of the multi-simulation technique. The goal is to retrieve the top k

=

2 highest

probabilities. Intervals represent uncertainty about the value of a tuple's probability score.

We assume that during any simulation step, no lower bound L i decreases, and no upper bound

U i increases. It is easy to check that this holds for Algorithm 3 ; for the Monte Carlo approximation

algorithms in Subsection 5.3.2 , while it fails if we define a simulation step to consis ts of one single

Monte Carlo trial, the assumption still holds with high probability is we perform √ N trials during

any simulation step [ Ré et al. , 2007 ]. This assumption implies immediately that L c never decreases,

and U c never increases. Thus, the critical region (L c ,U c ) will only shrink after a simulation step, or

stay the same.

Next, we define four sets of tuples as follows:

B k ={ i | U i ≤ L c

T k ={ i | U c

}

≤ L i }

B k ={ i | U i ≤ U c

T k ={ i | L c

}

≤ L i }

If the critical region is non-empty ( L c <U c ), then T k ⊆ T k and B k ⊆ B k ; if the critical region

is empty, then the opposite inclusions hold.

It is always the case that

| T k |≥ k because, by the definition of L c

=

max k (L 1 ,...,L n ) ,

there exists k lower bounds L i such that L i ≥ L c . By duality, it follows that

| B k |≥ n − k since

U c

=

max k + 1 (U 1 ,...,U n ) =

min n − k (U 1 ,...,U n ) .

Next Page

Probabilistic Databases

Search WWH ::

Custom Search

Home