Databases Reference
In-Depth Information
source, E t is an element of the target schema, and e is a matching expression that
specifies a relationship between the element E t and the elements in S s . Note that the
expression e does not specify how the elements in S s relate to each other. Most of
the time, a match is as simple as an equality or a set-inclusion relationship between
an element of the source and an element of the target. There are, however, cases in
which the relationship can be more complex, e.g., a concatenation function, some
arithmetic operation, a relationship over scalars like
, a conceptual model
relationship such as the part-of, or some set-oriented relationships, such as overlaps
or contains. Schema matching tools employ a number of different techniques to
discover this kind of relationships. They can range from structural [ Madhavan et al.
2001 ] and name similarities to semantic closeness [ Giunchiglia et al. 2004 ] and data
value analysis [ Doan et al. 2001 , 2004 ]. A schema matching tool accepts as input
the two schemas and generates the set of matches. Since any schema matching pro-
cess is based on semantics, its final output needs to be verified by a human expert.
The matching process can be roughly divided into three phases: the prematch, the
match, and the postmatch phase. During the first phase, the matcher performs some
computations and processes the data. Typically, this involves the training of the
classifiers in the case of machine learning-based matchers, the configuration of the
various parameters like thresholds and weight values used by the matching algo-
rithm, and the specification of auxiliary information, such as domain synonyms and
constraints [ Giunchiglia et al. 2009 ]. During the second phase, the actual discovery
of the matches takes place. At the end, the matcher outputs the matches between
elements of these data sources. During the postmatch phase, the users may check
and modify the displayed matches if needed.
Given a source and a target schema, a mapping is a relationship, i.e., a constraint,
that must hold between their respective instances. For the mappings to be generated,
a fundamental requirement are the matches between the elements of the schemas.
These matches can be either generated automatically through a matching process or
can be manually provided by an expert user. In contrast to matches, which specify
how instance values of individual source and target schema elements relate to each
other, a mapping additionally specifies how the values within the same instance
relate to each other. For example, a match may specify that the dollar price of a
product in the target corresponds to the multiplication of the price of the product
in the source (expressed in some foreign currency) multiplied by the exchange rate.
The mapping is the one that specifies that the exchange rate with which the product
price in the source is multiplied is the exchange rate of the currency in which the
price of the product is expressed. The mapping does so by specifying the right join
path between the price and the exchange rate attributes.
Mappings can be used in many different ways. In the case in which the target
schema is a virtual, i.e., not materialized, database as in virtual information integra-
tion systems, in P2P applications, or in data repositories that publish an interface
schema, the mappings can be used for query answering by driving the translation
of queries on the target schema to queries on the source [ Lenzerini 2002 ]. Another
major application of mappings is data exchange [ Fagin et al. 2003 ] in which given
a source instance, the mappings are used to drive the materialization of a target
D
or
Search WWH ::




Custom Search