Uncertainty in Data Integration and Dataspace Support Platforms - Schema Matching and Mapping

Databases Reference

In-Depth Information

Example 5. Consider rewriting Q in the motivating example, repeated as follows:

Q: SELECT mailing-addr FROM T

To answer the query, we first rewrite Q into query Q 0 by adding the id column:

Q': SELECT id, mailing-addr FROM T

We then invoke B Y T ABLE and generate the following SQL query to compute

by-table answers for Q 0 :

Qa: SELECT id, mailing-addr, SUM(pr)

FROM (

SELECT DISTINCT id, current-addr

AS mailing-addr, 0.5 AS pr

FROM S

UNION ALL

SELECT DISTINCT id, permanent-addr

AS mailing-addr, 0.4 AS pr

FROM S

UNION ALL

SELECT DISTINCT id, email-addr

AS mailing-addr, 0.1 AS pr

FROM S)

GROUP BY id, mailing-addr

Finally, we generate the results using the following query.

Qu: SELECT mailing-addr, NOR(pr) AS pr

FROM Qa

GROUP BY mailing-addr

where for a set of probabilities pr 1 ;:::;pr n , NOR computes 1 ˘ i D 1 .1 pr i /.

t

Example 6. Consider the schema p-mapping in Example 4 .IfwereviseQ slightly

by returning the join attribute, shown as follows, we can answer the query in

polynomial time.

Q': SELECT V.hightech

FROM T, V

WHERE T.mailing-addr = V.hightech

We answer the query by dividing it into two subqueries, Q 1 and Q 2 ,asshownin

Example 4 . We can compute Q 1 with query Q u (shown in Example 5 ) and compute

Q 2 similarly with a query Q 0 u . We compute by-tuple answers of Q 0 as follows:

SELECT Qu'.hightech, Qu.pr * Qu'.pr

FROM Qu, Qu'

WHERE Qu.mailing-addr = Qu'.hightect

t

Search WWH ::

Custom Search

Home