Uncertainty in Data Integration and Dataspace Support Platforms - Schema Matching and Mapping

Databases Reference

In-Depth Information

0: Input: Source S with p-mappings pM 1 ;:::;pM l for M 1 ;:::;M l .

Output: Single p-mapping pM between S and T .

1: For each i 2 Œ1;l , modify p-mapping pM i : Do the following for every possible mapping m

in pM i :

For every correspondence .a;A/ 2 m between source attribute a and mediated attribute

A in M i , proceed as fo llows. (1) Find the set of all mediated attributes B in T such t ha t

B A. Call this set B. (2) Replace .a;A/ in m with the set of all .a;B/'s, where B 2 B.

Call the resulting p-mapping pM i .

2: For each i 2 Œ1;l , modify probabilities in pM i : Multiply the probability of every schema

mapping in pM i by Pr.M i /, which is the probability of M i in the p-med-schema. (Note that

after this step the sum of probabilities of all mappings in pM i is not 1.)

3: Consolidate pM i 's: Initialize pM to be an empty p-mapping (i.e., with no mappings). For

each i 2 Œ1;l, add pM i to pM as follows:

For each schema mapping m in pM i with probability p: if m is in pM , with probability

p 0 , modify the probability of m in pM to .p C p 0 /; if m is not in pM, then add m to pM

with probability p.

4: Return the resulting consolidated p-mapping, pM; the probabilities of all mappings in pM

add to 1.

Algorithm 4: Consolidating p-mappings

is equal to the union of a set of clusters in T . Hence, any two attributes a i and a j

will be together in a cluster in T if and only if they are together in every mediated

schema of M . The algorithm initializes T to M 1

and then modifies each cluster of

T basedonclustersfromM 2

to M l .

Example 10. Consider a p-med-schema M

Df M 1 ;M 2 g

,whereM 1

contains

three attributes

f a 1 ;a 2 ;a 3 g

f a 4 g

,and

f a 5 ;a 6 g

,andM 2

contains two attributes

f a 2 ;a 3 ;a 4 g

and

f a 1 ;a 5 ;a 6 g

. The target schema T would then contain four

attributes:

f a 1 g

f a 2 ;a 3 g

f a 4 g

,and

f a 5 ;a 6 g

Note that in practice the consolidated mediated schema is the same as the mediated

schema that corresponds to the weighted graph with only certain edges. Here, we

show the general algorithm for consolidation, which can be applied even if we do

not know the specific pairwise similarities between attributes.

Consolidating p-mappings: Next, we consider consolidating p-mappings specified

w.r.t. M 1 ;:::;M l to a p-mapping w.r.t. the consolidated mediated schema T . Con-

sider a source S with p-mappings pM 1 ;:::;pM l for M 1 ;:::;M l , respectively. We

generate a single p-mapping pM between S and T in three steps. First, we modify

each p-mapping pM i ;i 2 Œ1;l; between S and M i

to a p-mapping pM i

between S

and T . Second, we modify the probabilities in each pM i

. Third, we consolidate all

possible mappings in pM i

's to obtain pM. The details are specified in Algorithm 4,

as follows.

Note that the second part of Step 1 can map one source attribute to multiple

mediated attributes; thus, the mappings in the result pM are one-to-many mappings

and so typically different from the p-mapping generated directly on the consoli-

dated schema. The following theorem shows that the consolidated mediated schema

Schema Matching and Mapping

Search WWH ::

Custom Search

Home