Databases Reference
In-Depth Information
heterogeneities. It also provides twelve test queries, each requiring the resolution of
a particular type of heterogeneity.
5
Synthetic Evaluation Scenarios
An important issue for a benchmark is to have not only fixed evaluation scenarios but
also scenarios representing generic patterns. In a world where the data is becoming
increasingly complicated, it is crucial to stress-test the tools for data and schemas of
different sizes. This means that matching and mapping benchmarks should support
dynamic generation of evaluation scenarios of different sizes with which one can
test how the tool under evaluation scale up.
Unfortunately, such a pluralism may be hard to find in real-world applications,
mainly due to privacy reasons, or because they typically originate from a single
domain that restricts their pluralism and makes them unsuitable for general-purpose
evaluations. Thus, a benchmark should be able to create synthetic test cases in a
systematic way that stress-test the mapping tools and allow the evaluation of their
performance under different situations.
In the case of a matching tool, generation of a synthetic test scenario involves
the creation of a source and a target schema, alongside the expected matches. The
construction of the two schemas should be done in parallel so that for every part of
the source schema, the part of the target schema with which it matches is known.
For the case of a mapping tool, the situation is similar, but instead of the expected
matches, the synthetic test scenario should have the expected transformation. The
construction of the latter should also be orchestrated with the construction of the
two schemas. For mapping tools in schema integration, a test scenario consists of a
set of source schemas, the expected integrated schema, and the specification on how
the expected integrated schema is related to the individual source schemas.
Generation of synthetic scenarios has in general followed two main approaches:
the top-down and the bottom-up approach. The former starts with some large sce-
nario and by removing parts of it generates other smaller scenarios. The latter
constructs each scenario from scratch. Both approaches can be applied in the case
of synthetic scenario generation for matching and mapping tools.
The top-down approach starts with an existing large source and target schema,
and systematically removes components to generate smaller scenarios satisfying
specific properties. The properties depend on the features of the matching or map-
ping task that needs to be evaluated. An example of an ontology matching evaluation
dataset that has been built using the top-down approach is TaxME2 [ Giunchiglia
et al. 2009 ]. In TaxME2, a set of original ontologies are initially constructed out of
the Google, Yahoo, and Looksmart Web directories. In the sequel, matches across
these ontologies are also defined and characterized. For every pair of ontologies,
portions are cut out alongside matches using elements from these portions. The
remaining parts of the two ontologies are used as the source and the target, and
the remaining matches form the expected correct matches. The process is repeated
multiple times, each time using a different portion that leads to the creation of a
Search WWH ::




Custom Search