Uncertainty in Data Integration and Dataspace Support Platforms - Schema Matching and Mapping

Databases Reference

In-Depth Information

We can prove that under the expected-value semantics, the answer for the SUM

operator under by-table and by-tuple semantics is the same; thus, query answer-

ing for SUM under the by-tuple and expected-value semantics is in PTIME.

For the COUNT operator, even query answering under the by-tuple semantics is

PTIME for the distribution semantics and thus also for other semantics. We next

illustrate this using an example.

For the rest of the combinations, we conjecture that query answering cannot

be finished in polynomial time and the complexity of query answering remains

open.

Example 8. Continue with the running example (Fig. 4.2 ) and consider the follow-

ing query.

Qc: SELECT COUNT( * ) FROM S

WHERE mailing-addr = 'Sunnyvale'

Tab le 4.2 shows how the probability of each answer value changes after we pro-

cess each source tuple under the by-tuple semantics. After processing the first tuple,

the probability of COUNT

D 0 is 0:5 C 0:1 D 0:6 (m 1 ;m 3 ) and that of COUNT

D 1

D 0

is 0:4 (m 2 ). After processing the second tuple, the probability of COUNT

is the probability that COUNT

D 0 after the first tuple times the probability of

applying m 3

to the second tuple, so 0:6 0:1 D 0:06.Thatof COUNT

D 1 is

0:6 .0:5 C 0:4/ C 0:4 0:1 D 0:58, that is, either COUNT

D 0 after we process the

first tuple and we apply m 1 or m 2 to the second tuple, or COUNT

D 1 after the first

tuple and we apply m 3

to the third tuple. Similarly, the probability of COUNT

D 2

is 0:4 .0:5 C 0:4/ D 0:36.

t

3.4

Creating p-Mappings

We now address the problem of generating a p-mapping between a source schema

and a target schema. We begin by assuming that we have a set of weighted corre-

spondences between the source attributes and the target attributes. These weighted

correspondences are created by a set of schema matching modules. However, as we

explain shortly, there can be multiple p-mappings that are consistent with a given

set of weighted correspondences, and the question is which of them to choose. We

describe an approach to creating p-mappings that is based on choosing the mapping

that maximizes the entropy of the probability assignment.

Tabl e 4. 2

Trace of query answering in Example 8

TupleID

COUNT

D 0

COUNT

D 1

COUNT

D 2

1

0:6

0:4

-

2

0:06

0:58

0.36

Schema Matching and Mapping

Search WWH ::

Custom Search

Home