On Evaluating Schema Matching and Mapping - Schema Matching and Mapping

Databases Reference

In-Depth Information

source, E t is an element of the target schema, and e is a matching expression that

specifies a relationship between the element E t and the elements in S s . Note that the

expression e does not specify how the elements in S s relate to each other. Most of

the time, a match is as simple as an equality or a set-inclusion relationship between

an element of the source and an element of the target. There are, however, cases in

which the relationship can be more complex, e.g., a concatenation function, some

arithmetic operation, a relationship over scalars like

, a conceptual model

relationship such as the part-of, or some set-oriented relationships, such as overlaps

or contains. Schema matching tools employ a number of different techniques to

discover this kind of relationships. They can range from structural [ Madhavan et al.

2001 ] and name similarities to semantic closeness [ Giunchiglia et al. 2004 ] and data

value analysis [ Doan et al. 2001 , 2004 ]. A schema matching tool accepts as input

the two schemas and generates the set of matches. Since any schema matching pro-

cess is based on semantics, its final output needs to be verified by a human expert.

The matching process can be roughly divided into three phases: the prematch, the

match, and the postmatch phase. During the first phase, the matcher performs some

computations and processes the data. Typically, this involves the training of the

classifiers in the case of machine learning-based matchers, the configuration of the

various parameters like thresholds and weight values used by the matching algo-

rithm, and the specification of auxiliary information, such as domain synonyms and

constraints [ Giunchiglia et al. 2009 ]. During the second phase, the actual discovery

of the matches takes place. At the end, the matcher outputs the matches between

elements of these data sources. During the postmatch phase, the users may check

and modify the displayed matches if needed.

Given a source and a target schema, a mapping is a relationship, i.e., a constraint,

that must hold between their respective instances. For the mappings to be generated,

a fundamental requirement are the matches between the elements of the schemas.

These matches can be either generated automatically through a matching process or

can be manually provided by an expert user. In contrast to matches, which specify

how instance values of individual source and target schema elements relate to each

other, a mapping additionally specifies how the values within the same instance

relate to each other. For example, a match may specify that the dollar price of a

product in the target corresponds to the multiplication of the price of the product

in the source (expressed in some foreign currency) multiplied by the exchange rate.

The mapping is the one that specifies that the exchange rate with which the product

price in the source is multiplied is the exchange rate of the currency in which the

price of the product is expressed. The mapping does so by specifying the right join

path between the price and the exchange rate attributes.

Mappings can be used in many different ways. In the case in which the target

schema is a virtual, i.e., not materialized, database as in virtual information integra-

tion systems, in P2P applications, or in data repositories that publish an interface

schema, the mappings can be used for query answering by driving the translation

of queries on the target schema to queries on the source [ Lenzerini 2002 ]. Another

major application of mappings is data exchange [ Fagin et al. 2003 ] in which given

a source instance, the mappings are used to drive the materialization of a target

D

or

Schema Matching and Mapping

Search WWH ::

Custom Search

Home