On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

to mapping generation time, and evaluation techniques proposed by Spicy [ Bonifati

et al. 2008a ] or STBenchmark [ Alexe et al. 2008c ] do not elaborate extensively on

the issue. This is not an omission on their behalf. It reflects the fact that it is hard to

measure time when human participation, in our specific case for the verification and

guidance of the mapping tool, is part of the process. The time required by humans

to understand the mappings generated by the tool and provide feedback is orders of

magnitude higher than the one the tool requires for computing the mappings.

The situation is slightly different in matching tools where there is limited human

intervention. Although computation time is still a central factor, it is not as important

as the quality of the generated matches. A recent evaluation on a number of match-

ing tools [ Yatskevich 2003 ] has extended previous evaluations [ Do et al. 2003 ]by

adding time measures for matching tasks on real-world matching scenarios. Unfor-

tunately, these metrics have yet to be materialized in an a benchmark. In a more

recent comparison [ Kopcke and Rahm 2010 ] of state-of-the-art matching tools,

generation time has been one of the main comparison criteria and is also one of

the metrics used by matching evaluation tools like XBenchMatch [ Duchateau et al.

2007 ] and the ISLab Instance Matching Benchmark [ Ferrara et al. 2008 ].

6.2

Data Translation Performance

It has already been mentioned that one of the popular uses of mappings is to translate

data from one source to another, i.e., the data exchange task. This translation is

done by materializing the target or integrated instance from the data of one or more

source instances according to the mappings. Data sources typically contain a large

number of records. This means that if the mappings are numerous and describe

complex transformations, then the time required to materialize the target instance

may be significant. Based on this observation, it is clear that one of the factors to

characterize the quality of a mapping tool is by the performance of the execution

of the transformations described by the generated mappings. Metrics that can be

used to measure such performance are the overall execution time and the degree of

parallelization.

[Time] The most general-purpose metric is the time required to perform the over-

all transformation time. Although this parameter is not explicitly stated in any

matching or mapping evaluation effort, certain extensive experiments found in the

literature [ Alexe et al. 2008c ] illustrate its importance. The generation of good trans-

formation scripts is actually a way to characterize good mapping tools. Note that to

avoid falling into the trap of evaluating the query execution engine instead of the

mapping tool, when measuring the performance of the generated transformation

scripts, all the comparison and evaluation experiments should be performed on the

same transformation engine.

There has been an increasing interest toward efficient methods for generating the

right target instance given a mapping scenario, and more specifically in generating

Search WWH ::

Custom Search

Home