Data exchange by example - Foundations of Data Exchange

Database Reference

In-Depth Information

(1)

FLIGHT(src,dest,airl,dep) −→

∃ f# ∃ arr

ROUTES(f#,src,dest)

∧ INFO FLIGHT(f#,dep,arr,airl)

(2)

FLIGHT(city,dest,airl,dep) ∧ GEO(city,country,popul)

−→ ∃ phone SERVES(airl,city,country,phone)

(3)

FLIGHT(src,city,airl,dep) ∧ GEO(city,country,popul)

−→ ∃ phone SERVES(airl,city,country,phone)

Figure 1.3 A schema mapping

mention both source and target schemas. So possible target instances T for a given source

S must satisfy the following condition:

For each condition ϕ of the mapping M , the pair ( S , T ) satisfies ϕ .

We call such instances T solutions for S under

,

and assume that the source S has a tuple ( Paris, Santiago, AirFrance, 2320) in FLIGHT .

Then every solution T for S under

M

. Look, for example, at our mapping

M

must have tuples

( x, Paris, Santiago )

in

ROUTES and

( x, 2320, y, AirFrance )

in

INFO FLIGHT

for some values x and y , interpreted as flight number and arrival time. The mapping says

nothing about these values: they may be real values (constants), e.g., (406, Paris, Santiago ),

or nulls , indicating that we lack this information at present. We shall normally use the

symbol

to denote nulls, so a common way to populate the target would be with tuples

( ⊥ , Paris, Santiago ) and ( ⊥ , 2320, ⊥ , AirFrance ). Note that the first attributes of both tu-

ples, while being unknown, are nonetheless the same. This situation is referred to as having

marked nulls ,or naıve nulls, as they are used in naıve tables, studied extensively in con-

nection with incomplete information in relational databases. At the same time, we know

nothing about the other null

⊥

⊥ used: nothing prevents it from being different from

⊥

but

nothing tells us that it should be.

Note that already this simple example leads to a crucial observation that makes the data

exchange problem interesting: solutions are not unique . In fact, there could be infinitely

many solutions: we can use different marked nulls, or can instantiate them with different

values.

If solutions are not unique, how can we answer queries? Consider, for example, a

Boolean (yes/no) query “Is there a flight from Paris to Santiago that arrives before

10am?” . The answer to this query has to be “no”, even though in some solutions we shall

have tuples with arrival time before 10am. However, in others, in particular in the one with

null values, the comparison with 10am will not evaluate to true, and thus we have to return

“no” as the answer.

Foundations of Data Exchange

Search WWH ::

Custom Search

Home