Uncertainty in Data Integration and Dataspace Support Platforms - Schema Matching and Mapping

Databases Reference

In-Depth Information

Definition 8 (By-tuple Answer). Let pM D .S;T; m / be a p-mapping. Let Q be

a query over T and D S be an instance of S with d tuples.

L et t be a tuple. Let seq .t/ be the subset of seq

.pM/, such that for each seq

2

d

seq .t/ and for each D T 2

Ta r seq .D S /, t 2 Q.D T /.

Let p D P seq 2 seq .t /

Pr . seq /.Ifp>0, we call .t;p/ a by-tuple answer of Q with

respect to D S

and pM.

t

is denoted by Q table .D S /

The set of by-table answers for Q with respect to D S

is denoted by Q t u ple .D S /.

and the set of by-tuple answers for Q with respect to D S

Example 3. Consider the p-mapping pM, the source instance D S , and the query Q

in the motivating example.

In by-table semantics, Fig. 4.3 b shows a target instance that is consistent with

D S (repeated in Fig. 4.3 a) and possible mapping m 1 . Figure 4.3 d shows the by-

table answers of Q with respect to D S

and pM. As an example, for tuple t D

(“Sunnyvale”), we have

m.t/ Df m 1 ;m 2 g

, so the possible tuple (“Sunnyvale,” 0.9)

is an answer.

In by-tuple semantics, Fig. 4.3 c shows a target instance that is by-tuple consistent

with D S and the mapping sequence <m 2 ;m 3 >. Figure 4.3 e shows the by-tuple

answers of Q with respect to D S

D

(“Sunnyvale”) in the by-table answers is different from that in the by-tuple answers.

We describe how to compute the probabilities in detail in the next section.

and pM. Note that the probability of tuple t

t

3.3

Query Answering

This section studies query answering in the presence of probabilistic mappings. We

start with describing algorithms for returning all answer tuples with probabilities,

and discussing the complexity of query answering in terms of the size of the data

( data complexity ) and the size of the p-mapping ( mapping complexity ). We then

consider returning the top-k query answers, which are the k answer tuples with the

top probabilities, and answering aggregate queries.

3.3.1

By-table Query Answering

In the case of by-table semantics, answering queries is conceptually simple. Given

a p-mapping pM D .S;T; m / and an SPJ query Q, we can compute the cer-

tain answers of Q under each of the mappings m 2

m . We attach the probability

Pr.m/to every certain answer under m. If a tuple is an answer to Q under multiple

mappings in m , then we add up the probabilities of the different mappings.

Algorithm B Y T ABLE takes as input an SPJ query Q that mentions the relations

T 1 ;:::;T l

in the FROM clause. Assume that we have the p-mapping pM i

associated

with the table T i . The algorithm proceeds as follows.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home