On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

heterogeneities. It also provides twelve test queries, each requiring the resolution of

a particular type of heterogeneity.

5

Synthetic Evaluation Scenarios

An important issue for a benchmark is to have not only fixed evaluation scenarios but

also scenarios representing generic patterns. In a world where the data is becoming

increasingly complicated, it is crucial to stress-test the tools for data and schemas of

different sizes. This means that matching and mapping benchmarks should support

dynamic generation of evaluation scenarios of different sizes with which one can

test how the tool under evaluation scale up.

Unfortunately, such a pluralism may be hard to find in real-world applications,

mainly due to privacy reasons, or because they typically originate from a single

domain that restricts their pluralism and makes them unsuitable for general-purpose

evaluations. Thus, a benchmark should be able to create synthetic test cases in a

systematic way that stress-test the mapping tools and allow the evaluation of their

performance under different situations.

In the case of a matching tool, generation of a synthetic test scenario involves

the creation of a source and a target schema, alongside the expected matches. The

construction of the two schemas should be done in parallel so that for every part of

the source schema, the part of the target schema with which it matches is known.

For the case of a mapping tool, the situation is similar, but instead of the expected

matches, the synthetic test scenario should have the expected transformation. The

construction of the latter should also be orchestrated with the construction of the

two schemas. For mapping tools in schema integration, a test scenario consists of a

set of source schemas, the expected integrated schema, and the specification on how

the expected integrated schema is related to the individual source schemas.

Generation of synthetic scenarios has in general followed two main approaches:

the top-down and the bottom-up approach. The former starts with some large sce-

nario and by removing parts of it generates other smaller scenarios. The latter

constructs each scenario from scratch. Both approaches can be applied in the case

of synthetic scenario generation for matching and mapping tools.

The top-down approach starts with an existing large source and target schema,

and systematically removes components to generate smaller scenarios satisfying

specific properties. The properties depend on the features of the matching or map-

ping task that needs to be evaluated. An example of an ontology matching evaluation

dataset that has been built using the top-down approach is TaxME2 [ Giunchiglia

et al. 2009 ]. In TaxME2, a set of original ontologies are initially constructed out of

the Google, Yahoo, and Looksmart Web directories. In the sequel, matches across

these ontologies are also defined and characterized. For every pair of ontologies,

portions are cut out alongside matches using elements from these portions. The

remaining parts of the two ontologies are used as the source and the target, and

the remaining matches form the expected correct matches. The process is repeated

multiple times, each time using a different portion that leads to the creation of a

Schema Matching and Mapping

Search WWH ::

Custom Search

Home