Database Reference
In-Depth Information
data reasonably, in order to make it consistent by means of minimal changes. In-
deed, these kinds of constraint often do not suffice to manage data consistency, as
they cannot be used to define algebraic relations between stored values. In fact, this
issue frequently occurs in several scenarios, such as scientific databases, statistical
and biological databases, and data warehouses, where numerical values of tuples are
derivable by aggregating values stored in other tuples.
In this topic we focus our attention on databases where stored data violate a set of
aggregate constraints , i.e. integrity constraints defined on aggregate values extracted
from the database. These constraints are defined on numerical attributes (such as sale
prices, costs, etc.) which represent measure values and are not intrinsically involved
in other forms of constraints. The following example describes a real-life scenario
where inconsistency of numerical attributes can arise, and aggregate constraints can
be exploited to manage the integrity of data.
Example 1.3. The balance sheet of a company is a financial statement providing in-
formation on what the company owns (its assets), what it owes (its liabilities), and
the value of the business to its stockholders. A thorough analysis of balance sheets
is extremely important for both stock and bond investors, since it allows potential
liquidity problems of a company to be detected, thus determining the company fi-
nancial reliability as well as its ability to satisfy financial obligations.
Generally balance sheets are available as paper documents, thus they cannot be
automatically processed by balance analysis tools, since these work on electronic
data only. Hence, the automatic acquisition of balance-sheet data from paper docu-
ments is often performed as the preliminary phase of the decision making process,
as it yields data prone to be analyzed by suitable tools for discovering information
of interest.
Table 1.1 represents a relation BalanceSheets obtained from the balance sheets
of two consecutive years of a company. These data were acquired by means of an
OCR ( Optical Character Recognition ) tool from paper documents. Values ' det ',
' aggr ' and ' drv ' in column Type stand for detail , aggregate and derived , respec-
tively. Specifically, an item is aggregate if it is obtained by aggregating items of
type detail of the same section, whereas a derived item is an item whose value can
be computed using the values of other items of any type and belonging to any sec-
tion. This example will be referred to as “Balance Sheets” and used as a running
example. We will denote the i -th tuple in Table 1.1 as t i .
Relation BalanceSheets must satisfy the following integrity constraints:
κ 1 :for each section and year, the sum of the values of all detail items must be equal
to the value of the aggregate item of the same section and year;
κ 2 :for each year, the net cash inflow must be equal to the difference between total
cash receipts and total disbursements ;
κ 3 :for each year, the ending cash balance must be equal to the sum of the beginning
cash and the net cash inflow .
Although the original balance sheet (in paper format) was consistent, its digital
version is not, as some symbol recognition errors occurred during the digitizing
Search WWH ::




Custom Search