Databases Reference
In-Depth Information
enhanced with datasets. In a recent effort [ Giunchiglia et al. 2009 ], an extension
was proposed that contains 4,500 matches between three different Web directories
and has three important features, namely, it is error-free, has a low complexity, and
has a high discriminative capability, a notion that is explained later. Unfortunately,
despite the fact that there is a strong need for comparing matchers using identical
evaluation scenarios, 2 there has been no broadly accepted agreement until today on
what these evaluation scenarios should be.
The XBenchMatch [ Duchateau et al. 2007 ] is a benchmark for matching tools.
It defines a set of criteria for testing and evaluating matching tools. It may focus
mostly on the assessment of the matching tools in terms of matching quality and
time performance but provides a testbed involving ten datasets that can be used to
quickly benchmark new matching algorithms [ Duchateau 2009 ]. These matching
scenarios have been classified according to the tasks they reflect, either at the data
level, e.g., the structure or the degree of heterogeneity, or at the matching process
level, e.g., the scale. Although collaborative work can help providing new datasets
with their correct set of matches, the creation of such a large and complete set still
remains a challenge.
It is important to add here that one of the challenges during the creation of test
scenarios is deciding what the correct matches will be. As mentioned in the previ-
ous section, for a given matching scenario, there may be multiple correct answers.
Opting for one of them may not be fair for the others. For this reason, in cases like
OAEI, the test scenarios designers perform a careful selection so that the scenarios
have no multiple alternatives, or in the case that they have, the one that is considered
as the correct answer to the chosen scenario is the one that is most obvious or the
one that the exclusive majority of matching users would have considered as correct.
One of the first benchmarks for mapping tools is the STBenchmark [ Alexe et al.
2008c ]. It contains a list of basic test scenarios, each consisting of a source schema,
a target schema, and a transformation query expressed in XQuery. The choice of
describing the mapping specification in XQuery was made to avoid any misinter-
pretation of the mapping that needs to be achieved. This, of course, does not mean
that the mappings that the mapping tool will generate will have to be necessarily in
XQuery, but they have to describe an equivalent mapping. Furthermore, the selec-
tion of XQuery as a mapping specification language causes no major issues to the
mapping tool evaluators, since such users are in general more experienced than reg-
ular mapping designers. They can easily understand the full details of the expected
transformation, and by using the mapping tool interface, they can try to material-
ize it. For mapping tools that accept matches as input, conversion from XQuery to
matches is a straightforward task.
Each STBenchmark mapping scenario is carefully designed to test the ability
of the mapping tool to create transformations of a specific kind. The evaluator
is expected to understand first the desired transformation by studying the trans-
formation script, and then try to implement it through the interface provided by
2 Netrics HD blog, April 2010: http://www.netrics.com/blog/a-data-matching-benchmark.
Search WWH ::




Custom Search