Database Reference
In-Depth Information
use of the set -minimal semantics, for different classes of integrity constraints. How-
ever, more and more interest has been exhibited in the card -minimal semantics in
recent works: repairs with the minimum number of performed updates were first
used in [28] (where a strategy for fixing categorical data was introduced), then dis-
cussed in [7] (in the context of relational data violating specific forms of universal
constraints), and, finally, studied in more detail in [43], in the presence of denial
constraints.
The interest in both the set - and card -minimal semantics, in the presence of dif-
ferent forms of integrity constraints, is due to the fact that, depending on the partic-
ular scenario addressed, each of them can be more suitable than the other. Also in
the presence of aggregate constraints (like those of Example 1.3), these semantics
are suitable for different application contexts. For instance, in the scenario of Exam-
ple 1.3, where inconsistency is due to acquisition errors, repairing the data by means
of sets of updates of minimum cardinality seems more reasonable, since the case that
the acquiring system made the minimum number of bad symbol-recognition errors
can be considered the most probable event. The same reasoning can be applied to
other scenarios dealing with numerical data acquired automatically, such as sen-
sor networks. In this case, inconsistency is often due to some trouble occurred at
sensors while generating some reading, thus repairing data by modifying the mini-
mum number of readings is justified. On the other hand, the set -minimal semantics
appears to be more suitable in the data integration context, where assembling data
from different (even consistent) databases can result in an inconsistent database (see
[40] for a survey on inconsistency in the context of data integration).
Besides the minimality semantics adopted, the repairing strategies proposed for
traditional forms of constraints differ in the update operations allowed for fixing
the inconsistent data. Most of the work in the literature considers repairs consist-
ing of tuple insertion/deletion operations on the inconsistent database. Indeed, this
repairing strategy is not suitable for contexts analogous to that of Example 1.3,
that is of data acquired by OCR tools from paper documents. In fact, using tu-
ple insertions/deletions as basic primitives means hypothesizing that the OCR tool
skipped/“invented” a whole row when acquiring the source paper document, which
is rather unrealistic. In this scenario, a repairing strategy based on attribute-update
operations only seems more reasonable, as updating single attribute values is the
most natural way for fixing inconsistencies resulting from symbol recognition er-
rors. The same holds in other scenarios dealing with numerical data representing
pieces of information acquired automatically, such as sensor networks. In a sensor
network with error-free communication channels, no reading generated by sensors
can be lost, thus repairing data by adding new readings (as well as removing col-
lected ones) is of no sense. However, also in the general case, as observed in [48],
a repairing strategy based on value updates can be more reasonable than strategies
performing insertions and/or deletions of tuples. In fact, aggregate constraints are
defined on measure attributes only, i.e., numerical attributes which are often a small
subset of the whole set of attributes. Hence, deleting tuples to make the data consis-
tent has the side-effect of removing the (possibly consistent) information encoded
in the other attributes of the deleted tuples, thus resulting in a loss of information
Search WWH ::




Custom Search