Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

m 1 copies company names and symbols in the NYSE source table to the Company

table in the target. In doing this, the mapping requires that some value - represented

by the I existentially quantified variable - is assigned to the id attribute of the Com-

pany table. The Public source contains two relations with companies names and

grants that are assigned to them; these information are copied to the target tables by

mapping m 2 ; in this case, a value - again denoted by the I existentially quantified

variable - must be “invented” to correlate a tuple in Grant with the corresponding

tuple in Company . Finally, mappings m 3 and m 4 copy data in the NSF source tables

to the corresponding target tables; note that in this case we do not need to invent any

values.

The target tgd encode the foreign key on the target. The target egd simply states

that symbol is key for Company .

To formalize, given two schemas, S and T ,an e m be dd ed de pe nde ncy [ Beeri and

V a r d i 19 84 ] is a first-order formula o f the form

y. .x;y/// ,where

x and y are vector s of variables, .x/ i s a c onjunction of atomic formulas such

tha t all varia bl e s in x appear in it, and .x;y/ is a conjunction of atomic formulas.

.x/ and .x;y/ may contain equations of the form v i D

x..x/

v j ,where v i

and v j

are

variables.

An embedded dependency is a tuple-generating dependency if .x/ and .x;y/

onl y co ntain relational atoms. It is an equality generating depe nd ency (egd) if

.x;y/ conta ins o nly equations. A tgd is called a s -t tgds if .x/ is a formula

over S and .x;y/ over T .Itisa target tgd if both .x/ and .x;y/ are formulas

over T .

A mapping scenario (also called a data exchange scenario or a schema mapping )

is a quadruple

. S ; T ;˙ st ;˙ t / ,where S is a source schema, T is a target

schema, ˙ st is a set of s-t tgds, and ˙ t is a set of target dependencies that may

contain tgds and egds. If the set of target dependencies ˙ t

M D

is empty, we will use the

notation . S ; T ;˙ st / .

Solutions. We can now introduce the notion of a solution for a mapping scenario. To

do this, given two disjoint schemas, S and T , we shall denote by

S ; T

the schema

S 1 ::: S n ; T 1 ::: T m g

.If I is an instance of S and J is an instance of T , then the pair

I;J

A target instance J is a solution of

is an instance of

S ; T

and a source instance I (denoted J

Sol.

;I/ )iff

I;J

iˆ

˙ st [

˙ t ,i.e., I and J together satisfy the dependencies.

. S ; T ;˙ st ;˙ t / , with s-t and target dependen-

cies, we find it useful to define a notion of a pre-solution for

Given a mapping scenario

M D

and a source instance

I as a solution over I for scenario

by remov-

ing target constraints. In essence, a pre-solution is a solution for the s-t tgds only,

and it does not necessarily enforce the target constraints.

Figure 5.3 shows several solutions for our example scenario on the source

instance in Fig. 5.1 . In particular, solution (a) is a pre-solution, since it satisfies

the s-t tgds but it does not comply with the key constraints and therefore it does

not satisfy the egds. Solution (b) is a solution for both the s-t tgds and the egds.

We want, however, to note that a given scenario may have multiple solutions on a

given source instance. This is a consequence of the fact that each tgd only states an

M st D

. S ; T ;˙ st / , obtained from

Schema Matching and Mapping

Search WWH ::

Custom Search

Home