Databases Reference
In-Depth Information
to mapping generation time, and evaluation techniques proposed by Spicy [ Bonifati
et al. 2008a ] or STBenchmark [ Alexe et al. 2008c ] do not elaborate extensively on
the issue. This is not an omission on their behalf. It reflects the fact that it is hard to
measure time when human participation, in our specific case for the verification and
guidance of the mapping tool, is part of the process. The time required by humans
to understand the mappings generated by the tool and provide feedback is orders of
magnitude higher than the one the tool requires for computing the mappings.
The situation is slightly different in matching tools where there is limited human
intervention. Although computation time is still a central factor, it is not as important
as the quality of the generated matches. A recent evaluation on a number of match-
ing tools [ Yatskevich 2003 ] has extended previous evaluations [ Do et al. 2003 ]by
adding time measures for matching tasks on real-world matching scenarios. Unfor-
tunately, these metrics have yet to be materialized in an a benchmark. In a more
recent comparison [ Kopcke and Rahm 2010 ] of state-of-the-art matching tools,
generation time has been one of the main comparison criteria and is also one of
the metrics used by matching evaluation tools like XBenchMatch [ Duchateau et al.
2007 ] and the ISLab Instance Matching Benchmark [ Ferrara et al. 2008 ].
6.2
Data Translation Performance
It has already been mentioned that one of the popular uses of mappings is to translate
data from one source to another, i.e., the data exchange task. This translation is
done by materializing the target or integrated instance from the data of one or more
source instances according to the mappings. Data sources typically contain a large
number of records. This means that if the mappings are numerous and describe
complex transformations, then the time required to materialize the target instance
may be significant. Based on this observation, it is clear that one of the factors to
characterize the quality of a mapping tool is by the performance of the execution
of the transformations described by the generated mappings. Metrics that can be
used to measure such performance are the overall execution time and the degree of
parallelization.
[Time] The most general-purpose metric is the time required to perform the over-
all transformation time. Although this parameter is not explicitly stated in any
matching or mapping evaluation effort, certain extensive experiments found in the
literature [ Alexe et al. 2008c ] illustrate its importance. The generation of good trans-
formation scripts is actually a way to characterize good mapping tools. Note that to
avoid falling into the trap of evaluating the query execution engine instead of the
mapping tool, when measuring the performance of the generated transformation
scripts, all the comparison and evaluation experiments should be performed on the
same transformation engine.
There has been an increasing interest toward efficient methods for generating the
right target instance given a mapping scenario, and more specifically in generating
Search WWH ::




Custom Search