On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

data. Their output looks like an ETL flowchart. ETL systems require no large intel-

ligent capabilities, since the input provided by the designer is so detailed that only

a limited form of reasoning is necessary. Similar to ETL systems are mashup edi-

tors [ Heinzl et al. 2009 ] that try to facilitate the mashup designer. The operational

goals of mashup editors are similar to those of ETL systems, so we do not consider

them as a separate category.

We use the term matching or mapping scenario to refer to a particular instance

of the matching or mapping problem, respectively. A scenario is represented by

the input provided to the matching or mapping tool. More specifically, a matching

scenario is a pair of source and target schema. A mapping scenario is a pair of source

and target schema alongside a specification of the intented mappings. A solution to

a scenario is a set of matches, respectively mappings, that satisfy the specifications

set by the scenario.

3

Challenges in Matching and Mapping System Evaluation

A fundamental requirement for providing universal evaluation of matching and

mapping tools is the existence of benchmarks. A benchmark for a computer appli-

cation or tool is based on the idea of evaluation scenarios , i.e., a standardized

set of problems or tests serving as a basis for comparison. 1 An evaluation sce-

nario for a matching/mapping tool is a scenario alongside the expected output

of the tool, i.e., the expected solution. Unfortunately, and unlike benchmarks for

relational database management tools, such as, TPC-H [ Transaction Processing

Performance Council 2001 ], or for XML query engines, such as, XMach [ Bohme

and Rahm 2001 ], X007 [ Bressan et al. 2001 ], MBench [ Runapongsa et al. 2002 ],

XMark [ Schmidt et al. 2002 ], and XBench [ Yao et al. 2004 ], the design of a bench-

mark for matching/mapping tools is fundamentally different and significantly more

challenging [ Okawara et al. 2006 ], mainly due to the different nature, goals, and

operational principles of the tool.

One of the differences is the fact that given a source and a target schema, there is

not always one “correct” set of matches or mappings. In query engines [ Transaction

Processing Performance Council 2001 ; Bohme and Rahm 2001 ], the correct answer

to a given query is uniquely specified by the semantics of the query language. In

matching/mapping tools, on the other hand, the expected answer depends not only

on the semantics of the schemas, which by nature may be ambiguous, but also on the

transformation that the mapping designer was intending to make. The situation rem-

inisces the case of Web search engines, where there are many documents returned

as an answer to a given keyword query, others more and others less related to the

query, but which document is actually the correct answer can only be decided by the

1 Source: Merriam Webster dictionary.

Search WWH ::

Custom Search

Home