Databases Reference
In-Depth Information
The quality of the target instance is also an important factor in the case of ETL
systems. For these systems, the quality is typically determined by the data fresh-
ness ,the resiliency to occasional failures ,andthe easy of maintenance [ Simitsis
et al. 2009 ]. Data freshness means that the effect of any modification in the source
instance is also implemented in the target. Resiliency to failures measures whether
different transformation routes or recovery procedures can guarantee that in case
that a part of the transformation fails, the data that was to be generated can be
generated either through different routes or by repetition of the failed procedure.
Finally, the maintainability is affected, among others, by the simplicity of the
transformation. A simple ETL transformation is more maintainable, whereas in a
complex transformation it is more difficult to keep track of the primitive transfor-
mations that take place. Occasionally, the compliance to business rules is also one
of the considered factors for measuring the quality of an ETL system.
7.4
Data Examples
Generating the expected target instance for evaluating a mapping system may not
always be the most desired method. The size of the target schema may be pro-
hibitively large, and its generation at mapping design time may not be feasible. Even
if its generation is possible, due to its size, even an expert mapping designer may
find hard to understand the full semantics of the generated transformation, since it
is practically impossible to always obtain a full view of the target data. The gen-
erated mappings between a source and the target schema may also be numerous,
ambiguous, and complicated to a degree that the designer is not able to understand
what and how some target data was created from data in the source. To cope with
these issues and help the designer in quickly and fully understanding the seman-
tics of the mapping-system-generated transformations and validate them, carefully
selected representative samples of the target instance can be used. Samples of the
expected target instance can be used to drive the mapping process, while samples
of the generated target instance can be used to communicate to the designer the
semantics of the mappings the system has generated.
The importance of data examples in mapping generation has long ago been
recognized [ Yan et al. 2001 ]. In the specific work, each mapping is considered
a transformation query and is interpreted as an indirectly connected graph G
D
.N; E/ , where the set of nodes N is a subset of the relations of the source schema
and the set of edges E represents conjunctions of join predicates on attributes of the
source relations. Typically, joins are inner joins, but they can also be considered as
outer joins or combinations of inner and outer joins. Given a query graph G ,the full
and the possible data associations can be computed. A data association is a relation
that contains the maximum number of attributes whose data are semantically related
through structural or constraint, e.g., foreign key, constructs. A full data association
of G is computed by an inner join query over G , and it involves all nodes in G .Given
an induced, connected subgraphs of G , a data association can be constructed in the
same way, but since it is based on a subgraph of G , the data association is referred
Search WWH ::




Custom Search