On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

user that posed the keyword query. For that reason, many evaluations of matching

or mapping tools are performed by human experts.

Another difficulty faced during the design of evaluation techniques for mapping

tools is the lack of a clear specification of the input language, i.e., a standardized for-

malism with well-defined semantics. In contrast to benchmarks for relational [ Trans-

action Processing Performance Council 2001 ] and XML systems [ Bohme and Rahm

2001 ] that could leverage from the respective SQL and XQuery standard query

languages, it is still not clear how to describe a scenario. Formally describing the

schemas is not an issue, but describing the intended transformation, i.e., the input

that the designer needs to provide, is. The best way to unambiguously specify the

intended transformation is through a transformation language script, or a mapping

in some formalism, but there are two main issues with this option. First, there are

no guarantees that the mapping tool will be able to accept the specific formalism as

input, or at least that there will be an unambiguous translation of the input from the

formalism into the input language supported by the mapping tool. The second issue

is that such an approach beats the purpose of a mapping tool, which is intended

to shield the mapping designer from the complexity and the peculiarities of the

transformation language. It is actually for that reason that mapping tool developers

have opted for simpler, higher-level specification languages, such as visual objects,

direct lines between schema elements, or the output of the matching process in gen-

eral. Unfortunately, such specification is by nature ambiguous. Consider one of the

already identified [ Alexe et al. 2008c ] ambiguous situations, described in Fig. 9.4 .It

is a simple scenario in which the mapping designer needs to copy the company data

from the source into organizations data in the target. To specify this, the designer

draws the two interschema lines illustrated in Fig. 9.4 . When these are fed to a pop-

ular commercial mapping tool, the tool generates a transformation script, which

generates the target instance illustrated in Fig. 9.5 a when executed on the instance

of Fig. 9.4 . A different tool, for the same input, produces a transformation script

that generates the instance illustrated in Fig. 9.5 b. A third one produces a script

that generates the instance of Fig. 9.5 c, which is most likely the one the mapping

designer had in mind to create. These differences are not an error from the side of

the tools, rather a consequence of the fact that in the absence of a global agreement

on the semantics of the matches, or the input language in general, different tools may

interpret them differently and may require different inputs for generating the same

mappings. In the above example, the tool that generated the instance in Fig. 9.5 a

could have also produced the instance of Fig. 9.5 c, if the designer had provided one

more match from the element Company to the element Organization . This match

(which is between nonleaf elements) is not allowed at all in the tool that created

the instance of Fig. 9.5 c. The issue is also highly related to the level of intelligence

and reasoning capabilities that the tools are offering. Some tools may require a min-

imum input from the user, and through advanced reasoning they may be able to

generate the intended mappings [ Bonifati et al. 2008b ; Fagin et al. 2009a ]. Others

may require the designer to be more explicit when describing the transformation she

hasinmindtocreate[ Altova 2008 ; Stylus Studio 2005 ]. Even by considering only

Schema Matching and Mapping

Search WWH ::

Custom Search

Home