Databases Reference
In-Depth Information
Another step toward the design of meaningful and high-quality schema mappings
has been tackled recently [ Alexe et al. 2010a ]byusinga MapMerge operator to
merge multiple small mappings into large ones. The evaluation of such an operator is
done by using a novel similarity metric that is able to capture the extent to which data
associations are preserved by the transformation from a source to a target instance.
The metric depends on the natural associations that exist among data values in the
source instance, discovered by looking at the schema structures and by following
the schema referential integrity constraints. The idea behind the metric is that these
associations must be preserved by the transformation that the mapping describes.
7.3
Quality of the Generated Target Instance
In systems that do not differentiate between the matching and the mapping task,
an alternative to measuring precision, recall, or f-measure would be preferable.
One such approach is to use the final expected result of the mapping process,
which is the actual target instance generated by the transformation described by
the mappings. This kind of evaluation is also useful in cases where one needs to
avoid comparisons between mappings for reasons like those provided earlier. The
expected target instance is typically provided by an expert user. Once the expected
target instance is available, the success of a mapping task can be measured by com-
paring it to the actual target instance produced by the generated mappings. The
approach constitutes an appealing verification and validation method, mainly due to
its simplicity.
The comparison between the actual and the expected target instance can be
done by considering an ad hoc similarity function, such as tree edit distance ,orby
employing a general-purpose comparison technique [ Bonifati et al. 2008a ]. Defin-
ing such a customized comparison technique is a promising direction for future
developments in this area. The Spicy system offers a comparison method based on
circuit theory [ Bonifati et al. 2008a ], called structural analysis . Figure 9.9 shows an
example of a circuit generated by the tree representation of a schema, as shown on
the left-hand side. The circuit is based on building blocks corresponding to atomic
attributes. More specifically, for each intermediate node n in the schema tree, a
resistance value r.n/ is defined. Such a value cannot be based on instances, since
intermediate nodes of the tree represent higher structures, but it is rather based on
the topology of the tree. In particular, r.n/
level .n/ ,where k is a constant
multiplicative factor, and level .n/ is the level of n in the tree, defined as follows: (1)
leaves have level 0 (2) an intermediate node with children n 0 ;n 1 ;:::n k
D
k
has level
max. level .n 0 /; level .n 1 /;::: level .n k //
1 .
The complete circuit is defined by means of a circuit mapping function , circ .t /
over a tree t . For a leaf node A , circ .A/ is defined by mapping a sampled attribute
to a circuit. Intuitively, circ .A/ is assembled by assigning a set of features to a
number of resistors and voltage generators. For a tree t rooted at node n with
children n 0 ;n 1 ;:::n k , circ .t / is the circuit obtained by connecting in parallel
C
Search WWH ::




Custom Search