On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

enhanced with datasets. In a recent effort [ Giunchiglia et al. 2009 ], an extension

was proposed that contains 4,500 matches between three different Web directories

and has three important features, namely, it is error-free, has a low complexity, and

has a high discriminative capability, a notion that is explained later. Unfortunately,

despite the fact that there is a strong need for comparing matchers using identical

evaluation scenarios, 2 there has been no broadly accepted agreement until today on

what these evaluation scenarios should be.

The XBenchMatch [ Duchateau et al. 2007 ] is a benchmark for matching tools.

It defines a set of criteria for testing and evaluating matching tools. It may focus

mostly on the assessment of the matching tools in terms of matching quality and

time performance but provides a testbed involving ten datasets that can be used to

quickly benchmark new matching algorithms [ Duchateau 2009 ]. These matching

scenarios have been classified according to the tasks they reflect, either at the data

level, e.g., the structure or the degree of heterogeneity, or at the matching process

level, e.g., the scale. Although collaborative work can help providing new datasets

with their correct set of matches, the creation of such a large and complete set still

remains a challenge.

It is important to add here that one of the challenges during the creation of test

scenarios is deciding what the correct matches will be. As mentioned in the previ-

ous section, for a given matching scenario, there may be multiple correct answers.

Opting for one of them may not be fair for the others. For this reason, in cases like

OAEI, the test scenarios designers perform a careful selection so that the scenarios

have no multiple alternatives, or in the case that they have, the one that is considered

as the correct answer to the chosen scenario is the one that is most obvious or the

one that the exclusive majority of matching users would have considered as correct.

One of the first benchmarks for mapping tools is the STBenchmark [ Alexe et al.

2008c ]. It contains a list of basic test scenarios, each consisting of a source schema,

a target schema, and a transformation query expressed in XQuery. The choice of

describing the mapping specification in XQuery was made to avoid any misinter-

pretation of the mapping that needs to be achieved. This, of course, does not mean

that the mappings that the mapping tool will generate will have to be necessarily in

XQuery, but they have to describe an equivalent mapping. Furthermore, the selec-

tion of XQuery as a mapping specification language causes no major issues to the

mapping tool evaluators, since such users are in general more experienced than reg-

ular mapping designers. They can easily understand the full details of the expected

transformation, and by using the mapping tool interface, they can try to material-

ize it. For mapping tools that accept matches as input, conversion from XQuery to

matches is a straightforward task.

Each STBenchmark mapping scenario is carefully designed to test the ability

of the mapping tool to create transformations of a specific kind. The evaluator

is expected to understand first the desired transformation by studying the trans-

formation script, and then try to implement it through the interface provided by

2 Netrics HD blog, April 2010: http://www.netrics.com/blog/a-data-matching-benchmark.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home