Databases Reference
In-Depth Information
3.3
REASONING WITH UNCERTAIN SCHEMA MATCHING
Having introduced a model for schema matching and the similarity matrix for capturing the un-
certainty involved in attribute correspondence and schema matching, we now turn our attention
to adding reasoning capabilities to manage uncertain schema matching. Such capabilities can take
similarity measures, as given in similarity matrices, and turn them into a metric that can be reasoned
with to create better matching results. We introduce two alternatives mechanisms for reasoning with
uncertain matching, namely fuzzy set theory and probability theory. The discussion of former is
taken mainly due to Gal et al. [ 2005a ], and the latter is based on the model of Dong et al. [ 2007 ].
3.3.1 REASONING USING FUZZY SET THEORY
The formal framework for computing similarities among attribute sets is based on fuzzy relations
[ Klir and Yuan , 1995 ].
D ,a primitive similarity relation is a fuzzy relation over
D
Definition 3.9
Given domains
and
μ , where the matching similarity μ (also annotated μ d,d ) is the membership
D × D , denoted
d,d
degree of the pair
in
μ .
D can represent attribute domains (possible values) or attribute names,
while μ d,d represents a similarity matrix entry.
A matching similarity of a primitive confidence relation is computed using some distance
metric among domain members. Some desirable properties of a primitive similarity relation are as
follows:
D
In our model,
and
Reflexivity: μ d,d
= 1. Reflexivity ensures that the exact matching receives the highest possible score
(as in the case of two identical attributes, e.g., with the same name).
Symmetry: μ d,d
μ d ,d . Symmetry ensures that the order in which two schemata are compared
has no impact on the final outcome.
=
max d D min μ d,d d ,d " . This type of transitivity is known as the max-
min transitivity property ( e.g. ,[ Klir and Yuan , 1995 ], p. 130). It provides a solid foundation
for the generation of fuzzy equivalence relations. As an example, one may generate α -level
equivalence, which contains all pairs whose confidence measure is greater than a threshold
α . While being a desirable property, transitivity is hard to achieve, and sometimes proximity
relations (satisfying reflexivity and symmetry) are used instead. Such a relation may, at some
α level, generate a partition of the domain, similarly to α -level equivalence. Determining the
right threshold, α is a tuning problem that has been addressed by several research works in
this area [ Leeetal. , 2007 ].
Transitivity: μ d,d "
Search WWH ::




Custom Search