Database Reference
In-Depth Information
tions of the company can be supported by evaluating aggregate queries on its balance
sheets. For instance, in our example, it may be useful to know:
q
1
: the maximum value of
cash sales
over years 2008 and 2009;
q
2
: the minimum value of
cash sales
over years 2008 and 2009;
q
3
: the sum of
cash sales
for both years 2008 and 2009.
Clearly, since the available data are inconsistent, the mere evaluation of these queries
on them may yield a wrong picture of the real world. However, the range-consistent
answers of these queries can still support several analysis tasks. For instance, know-
ing that, in every “reasonable” repair of the data, the maximum and the minimum of
cash sales
are in the intervals
[
,
]
[
,
]
1110
1150
and
900
1110
, respectively, and that the
[
,
]
sum of
cash sales
for the considered years is in
2010
2260
, can give a sufficiently
accurate picture of the trend of cash sales.
In this context, after characterizing the complexity of the CQA problem for
ag-
gregate queries
(such as queries
q
1
,
q
2
,
q
3
of Example 5.1) in the presence of ag-
gregate constraints, we present a technique for computing range-CQAs of aggre-
gate queries on data which are not consistent w.r.t. steady aggregate constraints.
Specifically, as our technique works for
card
-minimal repairs, range-CQAs are the
narrowest intervals containing all the answers of the aggregate queries that can be
obtained from every possible
card
-minimal repair. We show that range-CQAs of
aggregate queries can be evaluated without computing every possible
card
-minimal
repair, but only solving three
Mixed Integer Linear Programming
(MILP) problems.
Thus, again, our approach enables the computation of range-CQAs by means of
well-known techniques for solving MILP.
Throughout this chapter, we assume that the absolute values of measure at-
tributes are bounded by a constant M.
This assumption is acceptable from a practical
point of view, as it is often possible to pre-determine a specific range for numeri-
cal attributes. For instance, in our running example, it can be reasonably assumed
that the items in balance sheets are bounded by $10
9
. A discussion on possible
extensions beyond this limitation (which could be interesting from a theoretical per-
spective) is provided in Chapter 6.
5.2 Range-Consistent Answers
We consider aggregate queries involving scalar functions
,
and
return-
MIN
MAX
SUM
ing a single value for each relation.
Definition 5.1 (Aggregate Query).
An aggregate query
q
on a database scheme
D
is an expression of the form
f
R
WHERE
α
, where:
SELECT
FROM
i)
R
is a relation scheme in
D
;
ii)
f
is one of
MIN
(
A
)
,
MAX
(
A
)
or
SUM
(
A
)
, where
A
in an attribute of
R
; and
iii)
α
is boolean combination of atomic comparisons of the form
X
Y
, where
X
and
Y
are constants or non-measure attributes of
R
, and
∈{
=
,
=
,≤,≥,<,>}
.
Search WWH ::
Custom Search