Introduction - Repairing and Querying Databases under Aggregate Constraints

Database Reference

In-Depth Information

use of the set -minimal semantics, for different classes of integrity constraints. How-

ever, more and more interest has been exhibited in the card -minimal semantics in

recent works: repairs with the minimum number of performed updates were first

used in [28] (where a strategy for fixing categorical data was introduced), then dis-

cussed in [7] (in the context of relational data violating specific forms of universal

constraints), and, finally, studied in more detail in [43], in the presence of denial

constraints.

The interest in both the set - and card -minimal semantics, in the presence of dif-

ferent forms of integrity constraints, is due to the fact that, depending on the partic-

ular scenario addressed, each of them can be more suitable than the other. Also in

the presence of aggregate constraints (like those of Example 1.3), these semantics

are suitable for different application contexts. For instance, in the scenario of Exam-

ple 1.3, where inconsistency is due to acquisition errors, repairing the data by means

of sets of updates of minimum cardinality seems more reasonable, since the case that

the acquiring system made the minimum number of bad symbol-recognition errors

can be considered the most probable event. The same reasoning can be applied to

other scenarios dealing with numerical data acquired automatically, such as sen-

sor networks. In this case, inconsistency is often due to some trouble occurred at

sensors while generating some reading, thus repairing data by modifying the mini-

mum number of readings is justified. On the other hand, the set -minimal semantics

appears to be more suitable in the data integration context, where assembling data

from different (even consistent) databases can result in an inconsistent database (see

[40] for a survey on inconsistency in the context of data integration).

Besides the minimality semantics adopted, the repairing strategies proposed for

traditional forms of constraints differ in the update operations allowed for fixing

the inconsistent data. Most of the work in the literature considers repairs consist-

ing of tuple insertion/deletion operations on the inconsistent database. Indeed, this

repairing strategy is not suitable for contexts analogous to that of Example 1.3,

that is of data acquired by OCR tools from paper documents. In fact, using tu-

ple insertions/deletions as basic primitives means hypothesizing that the OCR tool

skipped/“invented” a whole row when acquiring the source paper document, which

is rather unrealistic. In this scenario, a repairing strategy based on attribute-update

operations only seems more reasonable, as updating single attribute values is the

most natural way for fixing inconsistencies resulting from symbol recognition er-

rors. The same holds in other scenarios dealing with numerical data representing

pieces of information acquired automatically, such as sensor networks. In a sensor

network with error-free communication channels, no reading generated by sensors

can be lost, thus repairing data by adding new readings (as well as removing col-

lected ones) is of no sense. However, also in the general case, as observed in [48],

a repairing strategy based on value updates can be more reasonable than strategies

performing insertions and/or deletions of tuples. In fact, aggregate constraints are

defined on measure attributes only, i.e., numerical attributes which are often a small

subset of the whole set of attributes. Hence, deleting tuples to make the data consis-

tent has the side-effect of removing the (possibly consistent) information encoded

in the other attributes of the deleted tuples, thus resulting in a loss of information

Repairing and Querying Databases under Aggregate Constraints

Search WWH ::

Custom Search

Home