Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

generate plausible interpretations to produce a precise and faithful representation

of the transformation, i.e., the mappings. For instance, in the schema mapping sce-

nario of Fig. 5.4 a, consider only the correspondence v 1 . One possible mapping that

this correspondence alone describes is that for each Public-Company in the source

instance, there should be in the target instance a Company with the same name .

Based on a similar reasoning for correspondence v 2 ,forevery Public-Grant with

identifier gid in the source instance, it is expected that there should be a Company

tuple in the target instance with that grant identifier as attribute fid . By noticing that a

Public-Grant is related to a Public-Company through the foreign key on attribute

company , one can easily realized that a more natural interpretation of these two

correspondences is that every public grant identifier found in a target schema tuple

of table Company should have as an associated company name the name of the

respective public company that the public grant is associated in the source. Yet, it is

not clear, whether public companies with no associated grants should appear in the

target table Company with a null fid attribute, or should not appear at all. Further-

more, note that the target schema relation has an attribute phone that is populated

from the homonym attribute from the source. This value should not be random but

somehow related to the company and the grant. However, note that the Contact

table in which the phone is located is related to the grant information through two

different join paths, i.e., one on the manager and one on the assistant. The informa-

tion provided by the correspondence on the phone is not enough to specify whether

the target should be populated with the phone of the manager or the phone of the

assistant.

The challenging task of interpreting the ambiguous correspondences gave raise

to the schema mapping problem as it has been introduced in Sect. 2 .

3.3

Schema Mapping as Query Discovery

One of the first mapping tools to systematically study the schema mapping problem

was Clio [ Miller et al. 2000 ], a tool developed by IBM. The initial algorithm of the

tool considers each target schema relation independently. For each relation R i ,it

creates a set V R i of all the correspondences that are on a target schema element that

belongs to the relation R i . Naturally, all these sets are mutually disjoint. For each

such set, a query Q

V R i will be constructed to populate the relation R i . The latter

query is constructed as follows. The set V R i of correspondences is further divided

into maximal subsets such that each such maximal subset M V R i

k

contains at most

one correspondence for each attribute of the respective target schema relation. For

all the source schema elements used by the correspondences in each such subset,

the possible join paths connecting them are discovered, and combined to form the

union of join queries. These queries are then combined together through an outer

union operator to form the query Q

V R i .

Schema Matching and Mapping

Search WWH ::

Custom Search

Home