Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

3.2

Correspondences

The first step toward the creation of mappings between two schemas was to under-

stand how the elements of the different schemas relate to each other. This relation-

ship had to be expressed in some high level specification. That specification was

materialized in the form of correspondences.

A correspondence maps atomic elements of a source schema to atomic elements

of the target schema. This specification is independent of logical design choices

such as the grouping of elements into tables (normalization choices), or the nesting

of records or tables (for example, the hierarchical structure of an XML schema). In

other words, one need not specify the logical access paths (join or navigation) that

define the associations between the elements involved. Therefore, even users that

are unfamiliar with the complex structure of the schema can easily specify them.

Correspondences can be represented graphically through simple arrows or lines that

connect the elements of the two schemas.

The efficacy of using element-to-element correspondences is greatly increased

by the fact that they need not be specified by a human user. They could be in fact the

result of an automatic component that matches the elements of the two schemas, and

then the mapping designer simply verifies the correctness of the results. This task

is found in the literature under the name schema matching and has received consid-

erable attention, and has led into a variety of methodologies and techniques [ Rahm

and Bernstein 2001 ].

A correspondence can be formally described as a tgd with one and only one

existentially quantified variable being equal to one of the universally quantified vari-

ables, and one term on each side of the dependency (for the case of the relational

schemas). The correspondence states that every value of the source schema element

represented by the first variable should also exist in the instance values of target

schema element represented by the second.

In certain cases, correspondences that involve more than one source schema

elements may exist, but there should always be one existentially quantified vari-

able whose value is determined as a function of the universally quantified variables

representing the participated source schema elements.

Consider the example of Fig. 5.4 a, which is a variation of the example presented

previously. Here, the first source consists of only the three relational tables Public-

Company , Public-Grant ,and Contact , while the target consists of only the table

Company . As before, the intra-schema lines represent schema constraints, and in

the particular example are foreign key constraints. The red dotted inter-schema

lines represent the correspondences. Note that the appearance of an attribute with

the same or similar name in both schemas does not necessarily mean that the two

attributes represent the same fact. For instance, consider the attributes symbol and

id . Although in the companies world these terms may be used interchangingly, in

the specific example, the lack of a line among them may be justified by a case in

which the attribute id may represent the fiscal number of the company while the

attribute symbol may be the symbol with which the company appears in the stock

exchange.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home