Database Reference
In-Depth Information
data reasonably, in order to make it consistent by means of minimal changes. In-
deed, these kinds of constraint often do not suffice to manage data consistency, as
they cannot be used to define algebraic relations between stored values. In fact, this
issue frequently occurs in several scenarios, such as scientific databases, statistical
and biological databases, and data warehouses, where numerical values of tuples are
derivable by aggregating values stored in other tuples.
In this topic we focus our attention on databases where stored data violate a set of
aggregate constraints
, i.e. integrity constraints defined on aggregate values extracted
from the database. These constraints are defined on numerical attributes (such as sale
prices, costs, etc.) which represent measure values and are not intrinsically involved
in other forms of constraints. The following example describes a real-life scenario
where inconsistency of numerical attributes can arise, and aggregate constraints can
be exploited to manage the integrity of data.
Example 1.3.
The balance sheet of a company is a financial statement providing in-
formation on what the company owns (its assets), what it owes (its liabilities), and
the value of the business to its stockholders. A thorough analysis of balance sheets
is extremely important for both stock and bond investors, since it allows potential
liquidity problems of a company to be detected, thus determining the company fi-
nancial reliability as well as its ability to satisfy financial obligations.
Generally balance sheets are available as paper documents, thus they cannot be
automatically processed by balance analysis tools, since these work on electronic
data only. Hence, the automatic acquisition of balance-sheet data from paper docu-
ments is often performed as the preliminary phase of the decision making process,
as it yields data prone to be analyzed by suitable tools for discovering information
of interest.
Table 1.1
represents a relation
BalanceSheets
obtained from the balance sheets
of two consecutive years of a company. These data were acquired by means of an
OCR (
Optical Character Recognition
) tool from paper documents. Values '
det
',
'
aggr
' and '
drv
' in column
Type
stand for
detail
,
aggregate
and
derived
, respec-
tively. Specifically, an item is
aggregate
if it is obtained by aggregating items of
type
detail
of the same section, whereas a
derived
item is an item whose value can
be computed using the values of other items of any type and belonging to any sec-
tion. This example will be referred to as “Balance Sheets” and used as a running
example. We will denote the
i
-th tuple in
Table 1.1
as
t
i
.
Relation
BalanceSheets
must satisfy the following integrity constraints:
κ
1
:for each section and year, the sum of the values of all
detail
items must be equal
to the value of the
aggregate
item of the same section and year;
κ
2
:for each year, the
net cash inflow
must be equal to the difference between
total
cash receipts
and
total disbursements
;
κ
3
:for each year, the
ending cash balance
must be equal to the sum of the
beginning
cash
and the
net cash inflow
.
Although the original balance sheet (in paper format) was consistent, its digital
version is not, as some symbol recognition errors occurred during the digitizing
Search WWH ::
Custom Search