Databases Reference
In-Depth Information
and the consolidated p-mapping are equivalent to the original p-med-schema and
p-mappings.
Theorem 10 (Merge Equivalence). For all queries Q , the answers obtained
by posing Q over a
with p-mappings
pM 1 ;:::;pM l is equal to the answers obtained by posing Q over the consolidated
mediated schema T with consolidated p-mapping pM .
p-med-schema M
D M 1 ;:::;M l g
t
4.5
Other Approaches
He and Chang ( 2003 ) considered the problem of generating a mediated schema for
a set of Web sources. Their approach was to create a mediated schema that is sta-
tistically maximally consistent with the source schemas. To do so, they assume that
the source schemas are created by a generative model applied to some mediated
schema, which can be thought of as a probabilistic mediated schema. (Some other
works, e.g., ( He et al. 2004 ; He and Chang 2006 ), have considered correlations
for schema matching as well.) The probabilistic mediated schema we described in
this chapter has several advantages in capturing heterogeneity and uncertainty in the
domain. We can express a wider class of attribute clusterings, and in particular clus-
terings that capture attribute correlations. Moreover, we are able to combine attribute
matching and co-occurrence properties for the creation of the probabilistic medi-
ated schema, allowing for instance two attributes from one source to have a nonzero
probability of being grouped together in the mediated schema. Also, the approach
for p-med-schema creation described in this chapter is independent of a specific
schema-matching technique, whereas the approach in He and Chang ( 2003 ) is tuned
for constructing generative models and hence must rely on statistical properties of
source schemas.
Magnani et al. ( 2005 ), proposed generating a set of alternative mediated schemas
based on probabilistic relationships between relations (such as an Instructor rela-
tion intersects with a Teacher relation but is disjoint with a Student relation)
obtained by sampling the overlapping of data instances. Here, we focus on match-
ing attributes within relations. In addition, our approach allows exploring various
types of evidence to improve matching, and we assign probabilities to the mediated
schemas we generate.
Chiticariu et al. ( 2008 ), studied the generation of multiple mediated schemas for
an existing set of data sources. They consider multitable data sources, not considered
in this chapter, but explore interactive techniques that aid humans in arriving at the
mediated schemas.
There has been quite a bit of work on automatically creating mediated schemas
that focused on the theoretical analysis of the semantics of merging schemas and
the choices that need to be made in the process ( Batini et al. 1986 ; Buneman et al.
1992 ; Hull 1984 ; Kalinichenko 1990 ; Miller et al. 1993 ; Pottinger and Bernstein
2002 ). The goal of these works was to make as many decisions automatically as
possible, but where some ambiguity arises, refer to input from a designer.
Search WWH ::




Custom Search