Database Reference
In-Depth Information
Q F ( L A , B )=
f
(
v
)=
Pr
(
v
)=∑ W ∈W L , A , B , Q F ( W )= v Pr
(
W
)
, where W is a possible
world, Q F (
is the answer to Q F on the linkages in W , and Pr
W
)
(
W
)
is the prob-
ability of W .
On a large set of linkages, there may be a huge number of possible worlds. Com-
puting a probability distribution completely may be very costly. Moreover, if there
are many possible answers in the possible worlds, enumerating all those values is
hard to be manipulated by users. Since histograms are popularly adopted in data
analytics and aggregate query answering, here we advocate answering aggregate
queries on linkages using histograms. We consider both equi-width histograms and
equi-depth histograms.
Definition 7.9 (Histogram answer). Consider an aggregate query Q on linkages
L
, let v min and v max are the minimum and the maximum values of Q on all possible
worlds.
Given a bucket width parameter
η
, and a minimum probability threshold
τ
,
the equi-width histogram answer to Q is a set of interval tuples
i ,
p i )(
1
v max v min
v max v min
i
η )
where
φ j =[
v min +(
j
1
,
v min +
j
η)(
1
j
<
η )
and
v max
v min
v max
v min
φ
=[
v min
+(
η
1
,
v max
]
are
η
equi-width intervals
v max
v min
η
between v min and v max where p i =
Pr
(
Q
( L ) φ )
. An interval pair
i ,
p i )
is output
only if p i τ
.
Given an integer k
>
0, the equi-depth histogram answer to Q is a set of interval
tuples
i ,
p i )(
1
i
k
)
where
φ j =[
v j 1 ,
v j )(
1
j
<
k , v 0 =
v min , and v j =
j
k } )
arg min x {
x
|
Pr
(
Q
( L )
x
)
and
φ k =[
v k 1 ,
v max ]
.
7.6.2 Count, Sum and Average Queries
To answer a count query on a set of linkages, we can apply the same technique
discussed in Sections 7.3 and 7.4. Once the count probability distribution is cal-
culated, we can compute the answer to the aggregate histogram query by partition
the count values into
equi-width intervals or k equi-depth intervals.
The bottleneck of answering count queries are integrating multiple chains in a
connected component and integration multiple components. We introduce two ap-
proximation techniques that accelerate the computation for equi-width histogram
answer and equi-depth histogram answer, respectively.
η
7.6.2.1 Equi-width Histogram Answer Approximation
When computing the count probability for G using Theorem 7.5, intuitively, we
can ignore the values whose probabilities are very small.
Let x 1 ,···,
x m be the list of values in f i 1 (
x
)
in the probability ascending order.
Let x μ =
max 1 j m {
x j | 1 h j Pr
(
x h ) < ε }
, then we adjust the probabilities as
 
Search WWH ::




Custom Search