Uncertainty in Data Integration and Dataspace Support Platforms - Schema Matching and Mapping

Databases Reference

In-Depth Information

and the consolidated p-mapping are equivalent to the original p-med-schema and

p-mappings.

Theorem 10 (Merge Equivalence). For all queries Q , the answers obtained

by posing Q over a

with p-mappings

pM 1 ;:::;pM l is equal to the answers obtained by posing Q over the consolidated

mediated schema T with consolidated p-mapping pM .

p-med-schema M

D M 1 ;:::;M l g

t

4.5

Other Approaches

He and Chang ( 2003 ) considered the problem of generating a mediated schema for

a set of Web sources. Their approach was to create a mediated schema that is sta-

tistically maximally consistent with the source schemas. To do so, they assume that

the source schemas are created by a generative model applied to some mediated

schema, which can be thought of as a probabilistic mediated schema. (Some other

works, e.g., ( He et al. 2004 ; He and Chang 2006 ), have considered correlations

for schema matching as well.) The probabilistic mediated schema we described in

this chapter has several advantages in capturing heterogeneity and uncertainty in the

domain. We can express a wider class of attribute clusterings, and in particular clus-

terings that capture attribute correlations. Moreover, we are able to combine attribute

matching and co-occurrence properties for the creation of the probabilistic medi-

ated schema, allowing for instance two attributes from one source to have a nonzero

probability of being grouped together in the mediated schema. Also, the approach

for p-med-schema creation described in this chapter is independent of a specific

schema-matching technique, whereas the approach in He and Chang ( 2003 ) is tuned

for constructing generative models and hence must rely on statistical properties of

source schemas.

Magnani et al. ( 2005 ), proposed generating a set of alternative mediated schemas

based on probabilistic relationships between relations (such as an Instructor rela-

tion intersects with a Teacher relation but is disjoint with a Student relation)

obtained by sampling the overlapping of data instances. Here, we focus on match-

ing attributes within relations. In addition, our approach allows exploring various

types of evidence to improve matching, and we assign probabilities to the mediated

schemas we generate.

Chiticariu et al. ( 2008 ), studied the generation of multiple mediated schemas for

an existing set of data sources. They consider multitable data sources, not considered

in this chapter, but explore interactive techniques that aid humans in arriving at the

mediated schemas.

There has been quite a bit of work on automatically creating mediated schemas

that focused on the theoretical analysis of the semantics of merging schemas and

the choices that need to be made in the process ( Batini et al. 1986 ; Buneman et al.

1992 ; Hull 1984 ; Kalinichenko 1990 ; Miller et al. 1993 ; Pottinger and Bernstein

2002 ). The goal of these works was to make as many decisions automatically as

possible, but where some ambiguity arises, refer to input from a designer.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home