Databases Reference
In-Depth Information
CHAPTER
3
Modeling Uncertain Schema
Matching
Concepts cannot be identical with mental objects of any kind.
- Hilary Putnam
Theoretical models for attribute correspondences have been investigated by
Alagic and Bernstein [ 2001 ], Madhavan et al. [ 2002 ], and Benerecetti et al. [ 2005 ].
Alagic and Bernstein [ 2001 ] represent correspondences using morphisms (structure-preserving
mappings) in categories (which can be viewed as typed objects) [ Lane , 1998 ]. The work of
Madhavan et al. [ 2002 ] provides explicit semantics for matchings using logical models and model
satisfaction. Benerecetti et al. [ 2005 ] provide a formal model of schema matching for topic
hierarchies, modeled as rooted directed trees, where a node has a “meaning” generated using an
ontology. A matching connects topic hierarchies by some relation ( e.g. , subsumption).
In this chapter, we seek a generic data model for representing the uncertainty of the matching
process. Our proposed model allows the modeling of uncertainty while seeking better algorithms to
reduce it (and thus increase user effectiveness), and supporting features such as matcher ensembling.
We start by presenting a schema matching model in Section 3.1 . In Section 3.2 , we demonstrate,
using several examples, that the similarity matrix is sufficient for modeling basic uncertainty of
correspondences. Then, we discuss two alternatives to reasoning with uncertainty in schema match-
ing: fuzzy-set theory in Section 3.3.1 and probability theory in Section 3.3.2 . We conclude with a
description of how schema matchers are assessed for quality (Section 3.4 ).
3.1
MODEL
We now provide a model for schema matching, drawing on various ideas from the literature, and
show its wide applicability and usability. We accompany the description with an example, based on
[ Gal , 2010 ].
Example 3.1 This case study involves the design of a hotel reservation portal. The portal merges
various information databases for the hotel chain RoomsRUs, adding a mashup application that
helps position the hotels on a geographical map. We consider three relational databases. Database
R contains three relations: credit card information in the CardInfo relation; hotel information in
 
Search WWH ::




Custom Search