On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

Mapper, which is embedded in Microsoft Visual Studio [ Microsoft 2005 ], Stylus

Studio[ Stylus Studio 2005 ], BEA AquaLogic [ Carey 2006 ], and the research proto-

types Rondo [ Do and Rahm 2002 ], COMA

[ Aumueller et al. 2005 ], Harmony

[ Mork et al. 2008 ], S-Match [ Giunchiglia et al. 2005 ], Cupid [ Madhavan et al.

2001 ], Clio [ Popa et al. 2002 ], Tupelo [ Fletcher and Wyss 2006 ], Spicy [ Bonifati

et al. 2008a ], and HePToX [ Bonifati et al. 2010 ].

Despite the availability of the many mapping tools, no generally accepted bench-

mark has been developed for comparing and evaluating them. As it is the case with

other benchmarks, such a development is of major importance for assessing the rel-

ative merits of the tools. This can help customers in making the right investment

decisions and selecting among the many alternatives the tools that better fit their

business needs. A benchmark can also help the mapping tool developers as it offers

them a common metric to compare their own achievements against those of the com-

petitors. Such comparisons can boost competition and drive the development toward

systems of higher quality. A benchmark is also offering the developers a generally

accepted language for talking to customers and describing the advantages of their

tools through well-known features that determine performance, effectiveness, and

usability. Furthermore, the benchmark can highlight limitations of the mapping tools

or unsupported features that may not have been realized by the developers. Finally,

a benchmark is also needed in research community [ Bertinoro 2007 ]. Apart from a

common platform for comparison, a benchmark allows researchers to evaluate their

achievements not only in terms of performance but also in terms of applicability in

real-world situations.

In this work, we summarize and present in a systematic way existing efforts

toward the characterization and evaluation of mapping tools, and the establishment

of a benchmark. After a quick introduction of the architecture and main functionality

of matching and mapping tools in Sect. 2 , we describe the challenges of building a

matching/mapping system benchmark in Sect. 3 . Section 4 presents existing efforts

in collecting real-world test cases with the intention of using them in evaluating the

matching and mapping systems. Section 5 addresses the issue of creating synthetic

test cases that are targeting the evaluation of specific features of the mapping sys-

tems. Finally, Sects. 6 and 7 present different metrics that have been proposed in

the literature for measuring the efficiency and effectiveness of matching/mapping

systems, respectively.

CC

2

The Matching and Mapping Problem

Matching is the process that takes as input two schemas, referred to as the source

and the target , and produces a number of matches, aka correspondences , between

the elements of these two schemas [ Rahm and Bernstein 2001 ]. The term schema is

used with the broader sense and includes database schemas [ Madhavan et al. 2001 ],

ontologies [ Giunchiglia et al. 2009 ], or generic models [ Atzeni and Torlone 1995 ].

A match is defined as a triple

h

S s , E t , e

i

,where S s is a set of elements from the

Schema Matching and Mapping

Search WWH ::

Custom Search

Home