Databases Reference
In-Depth Information
number of these cases that were indeed implemented successfully. The TPC-H
benchmark [ Transaction Processing Performance Council 2001 ], for instance, con-
sists of a set of predefined queries on a given database, with each of these queries
testing a specific feature of the query language that the query engine is expected to
support. For each such query, the benchmark provides the expected correct answer
against which the results of the query execution on the under evaluation engine
can be compared. Accordingly, a mapping tool benchmark should provide a set of
evaluation scenarios, i.e., scenarios alongside the expected result.
There has been a number of efforts toward building collections of evaluation
scenarios. There is an unquestionable value to these collections. The ability of a
mapping method or tool to successfully execute the evaluation scenarios is a clear
indication of its practical value. By successful execution, we mean that the tool
is able to generate the expected output as described by the evaluation scenario.
Although these collections are built based on criteria such as popularity, commu-
nity acceptance, or by contributions of interested parties and by the user base, they
often lack systematic categorization of the cases they test. For instance, they may
have multiple evaluation scenarios testing the same feature of the tool, or they may
provide no generalized test patterns. For that reason, this kind of collections are
typically termed as testbeds or standardized tests .
A complete and generic benchmark should go beyond a simple set of test cases.
It should offer a systematic organization of tests that is consistent , complete ,and
minimal . Consistent means that the existence of every test case should be justified
by some specific feature upon which the tool or technique is evaluated through the
test case. Complete means that for every important feature of the mapping tool under
evaluation there is a test case. Minimal means that there are no redundant test cases,
i.e., more than one test case for the same feature.
To evaluate a matching tool on a given evaluation scenario, the scenario is pro-
vided to the tool that produces a solution. That generated solution, which in the
case of a matching tool is a set of matches, is then compared against the expected
set of matches that the evaluation scenario contains. If the two sets are the same,
then the tool is said to be successful for this scenario. The evaluation scenarios are
typically designed to check a specific matching situation. Success or failure to a
specific scenario translates into the ability or inability of the matching tool under
evaluation to handle the specific matching situation. This kind of evaluation is the
one for which testbeds are designed for. The Ontology Alignment Evaluation Ini-
tiative [ Euzenat et al. 2006 ], OAEI in short, is a coordinated international initiative
that every year organizes a matching competition for ontologies. Ontologies can be
seen as semantic schemas; thus, ontology matching is considered part of the general
matching problem. The initiative provides the contesters with a set of matching test
scenarios with which the contesters test their tools. Throughout the year, individu-
als may also submit to the initiative various scenarios they meet in practice. As a
result, the collected scenarios of the initiative constitute a good representation of the
reality. In some recent evaluation of a number of matching tools [ Kopcke and Rahm
2010 ], the number of real-world test problems that the matching tool could handle
featured as one of the main comparison criteria. The OAEI scenarios may be further
Search WWH ::




Custom Search