Databases Reference
In-Depth Information
3.3.4
Answering Aggregate Queries
Finally, we discuss queries with aggregate operators: COUNT, SUM, AVG,
MAX, and MIN basedonresultsfrom Galetal. ( 2009 ). We consider three common
extensions to semantics with aggregates and probabilistic information: the range
semantics returns the range of the aggregate ( i.e. , the minimum and the maximum
value); the expected-value semantics returns the expected value of the aggregate,
and the distribution semantics returns all possible values with their probabilities.
Note that the answer under the former two semantics can be derived from that under
the last semantics; in other words, the distribution semantics is the richest one. We
next formally define the three semantics.
Definition 9 (Semantics of Aggregate Query). Let pM D .S;T; m / be a p-
mapping, Q be an aggregate query over T ,andD S be an instance of S.Let V
be the set of result values of evaluating Q on D S w.r.t. pM under by-table (resp.
by-tuple) semantics and Pr.v/be the probability of value v 2 V .
1. Range semantics : The result is the interval Œmin. V/;max. V/.
2. Expected-value semantics : The result is P v 2 V Pr.v/ v.
3. Distribution semantics : The result is a random variable X, s.t. for each distinct
value v 2
V , Pr.X D v/ D Pr.v/.
t
According to the definition, there are six combinations of semantics for aggre-
gate queries w.r.t. a p-mapping. Since results under the range or expected-value
semantics can be derived from the results under the distribution semantics in poly-
nomial time, the complexity of query answering w.r.t. to the former two semantics is
no higher than that w.r.t. to the distribution semantics; in fact, in some cases, query
answering w.r.t. to the former two semantics can be computed more efficiently with-
out obtaining the distribution. Table 4.1 summarizes the complexity results for each
aggregate operator, and we now explain them briefly.
In by-table query answering, we can enumerate all answers and compute their
probabilities in polynomial time; thus, query answering is in PTIME for all
semantics.
In by-tuple query answering, we can enumerate all answers (without computing
their probabilities) in polynomial time; thus, query answering under the range
semantics (where we do not need to know the probabilities of each answer) is in
PTIME.
Tabl e 4. 1
Complexity of answering aggregate queries under different semantics
Semantics
Operator
Range
Expected-value
Distribution
By-table
COUNT, SUM, AVG, MIN, MAX
PTIME
PTIME
PTIME
By-tuple
COUNT
PTIME
PTIME
PTIME
SUM
PTIME
PTIME
?
AVG, MIN, MAX
PTIME
?
?
 
Search WWH ::




Custom Search