On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

measuring the effort spent for a matching or a mapping task using a tool can serve as

an indication of the success of the tool. Unfortunately, such metrics are not broadly

accepted, since they highly depend on the user interface. An advanced user interface

will lead to good evaluation results, which means that the evaluation of a mapping

tool is actually a graphical interface evaluation. Furthermore, the fact that there is

no global agreement on the expressive power of the interface poses limits on the

evaluation scenarios that can be run. A mapping tool with a simple interface may

require less designer effort but may also be limited on the kind of mappings or trans-

formations it can generate. This has led a number of researchers and practitioners

into considering as an alternative metric the expressive power of the mappings that

the tool can generate, while others talked about the quality of the mappings them-

selves [ Bonifati et al. 2008b ] or the quality of the integrated schema, for the case in

which the mapping tool is used for schema integration. The quality of the integrated

schema is important for improving query execution time, successful data exchange,

and accurate concept sharing. Unfortunately, there is no broadly accepted agreement

on how mapping quality is measured; thus, to provide meaningful comparisons, an

evaluation method should consider a number of different metrics for that purpose.

Developing evaluation techniques for mapping tools is also limited by the non

deterministic output of the scenarios. In contrast to query engines, different map-

ping tools may generate different results for the same input, without any of the

results being necessarily wrong. In particular, for a given high-level mapping speci-

fication, there may be different interpretation alternatives, and each tool may choose

one over another. The ability to effectively communicate to the mapping designer

the semantics of the generated output is of major importance to allow the designer to

effectively guide the tool toward the generation of the desired mappings. One way to

do so is to present the designer with the target instance that the generated mappings

can produce. This is not always convenient, practical, or even feasible, especially for

large complicated instances. Presenting the mapping to the designer seems prefer-

able [ Velegrakis 2005 ], yet it is not always convenient, since the designer may not be

familiar with the language in which the mappings are expressed. An attractive alter-

native [ Alexe et al. 2008a ] is to provide carefully selected representative samples of

the target instance or synthetic examples that effectively illustrate the transformation

modeled by the generated mappings. This option is becoming particularly appealing

nowadays that more and more systems are moving away from exact query seman-

tics toward supporting keyword [ Bergamaschi et al. 2010 ] and approximate queries,

or queries that embrace uncertainty in the very heart of the system [ Ioannou et al.

2010 ].

4

Real-World Evaluation Scenarios

A close look at popular benchmarks can reveal a common design pattern. The

benchmark provides a number of predefined test cases that the tool under eval-

uation is called to successfully execute. The tool is then evaluated based on the

Search WWH ::

Custom Search

Home