Database Reference
In-Depth Information
tions of the company can be supported by evaluating aggregate queries on its balance
sheets. For instance, in our example, it may be useful to know:
q 1 : the maximum value of cash sales over years 2008 and 2009;
q 2 : the minimum value of cash sales over years 2008 and 2009;
q 3 : the sum of cash sales for both years 2008 and 2009.
Clearly, since the available data are inconsistent, the mere evaluation of these queries
on them may yield a wrong picture of the real world. However, the range-consistent
answers of these queries can still support several analysis tasks. For instance, know-
ing that, in every “reasonable” repair of the data, the maximum and the minimum of
cash sales are in the intervals
[
,
]
[
,
]
1110
1150
and
900
1110
, respectively, and that the
[
,
]
sum of cash sales for the considered years is in
2010
2260
, can give a sufficiently
accurate picture of the trend of cash sales.
In this context, after characterizing the complexity of the CQA problem for ag-
gregate queries (such as queries q 1 , q 2 , q 3 of Example 5.1) in the presence of ag-
gregate constraints, we present a technique for computing range-CQAs of aggre-
gate queries on data which are not consistent w.r.t. steady aggregate constraints.
Specifically, as our technique works for card -minimal repairs, range-CQAs are the
narrowest intervals containing all the answers of the aggregate queries that can be
obtained from every possible card -minimal repair. We show that range-CQAs of
aggregate queries can be evaluated without computing every possible card -minimal
repair, but only solving three Mixed Integer Linear Programming (MILP) problems.
Thus, again, our approach enables the computation of range-CQAs by means of
well-known techniques for solving MILP.
Throughout this chapter, we assume that the absolute values of measure at-
tributes are bounded by a constant M. This assumption is acceptable from a practical
point of view, as it is often possible to pre-determine a specific range for numeri-
cal attributes. For instance, in our running example, it can be reasonably assumed
that the items in balance sheets are bounded by $10 9 . A discussion on possible
extensions beyond this limitation (which could be interesting from a theoretical per-
spective) is provided in Chapter 6.
5.2 Range-Consistent Answers
We consider aggregate queries involving scalar functions
,
and
return-
MIN
MAX
SUM
ing a single value for each relation.
Definition 5.1 (Aggregate Query). An aggregate query q on a database scheme
D
is an expression of the form
f
R
WHERE α
, where:
SELECT
FROM
i) R is a relation scheme in
D
;
ii) f is one of
MIN (
A
)
,
MAX (
A
)
or
SUM (
A
)
, where A in an attribute of R ; and
iii)
α
is boolean combination of atomic comparisons of the form X
Y , where X and
Y are constants or non-measure attributes of R , and
∈{ = , = ,≤,≥,<,>}
.
Search WWH ::




Custom Search