Databases Reference
In-Depth Information
highest value in that matrix. The two boxes at the bottom of Figure 3.1 were generated using a
beta distribution. According to Ross [ 1997 ] : “The beta distribution can be used to model a random
phenomenon whose set of possible values is in some finite interval
—which, by letting c denote
the origin and taking d c as a unit measurement, can be transformed into the interval
[
c, d
]
[
]
.” A
beta distribution has two tuning parameters, a and b . To receive a density function that is skewed to
the left (as in the case of incorrect attribute correspondences, bottom left in Figure 3.1 ), we require
that b>a . For right-skewed density functions (as in the case of correct attribute correspondences,
bottom right), one needs to set a>b .
Going back to the semantics of data models, we note that schema matchers often use data
model semantics when determining the similarity between attributes. For example, XML structure
has been used in Cupid [ Madhavan et al. , 2001 ] to support or dispute linguistic similarities. Also,
the similarity flooding algorithm [ Melnik et al. , 2002 ] uses structural links between attributes to
update linguistic similarities. However, once this similarity has been determined and recorded in the
similarity matrix, the original semantics that derived it is no longer needed. Therefore, the matrix
representation, as given above, is sufficient to represent the uncertainty involved in the matching
process.
Similarity matrices have been used in the literature mainly as a convenient representation
model, rather than a formal model that is used for reasoning, with two exceptions. Do and Rahm
[ 2002 ] propose a cube to represent an ensemble of similarity values, transformed into a matrix by ag-
gregating the similarity values of each attribute matching across ensemble members. Domshlak et al.
[ 2007 ] have taken this process one step further and proposed the use of the matrix abstraction to
perform local and global aggregations as a matrix-to-constant and cube-to-matrix function (see
Chapter 4 ).
0 , 1
3.1.3 SCHEMA MATCHING
Let the power-set = 2 S be the set of all possible schema matchings between the schema pair S,S ,
where a schema matching σ is a set of attribute correspondences. It is worth noting that σ does
not necessarily contain all attributes in S or S .Therefore, there may exist an attribute A S , such that
= A S A S , A, A σ
for all A S , A, A σ . For convenience, we denote by
σ
A S A S, A, A σ the set of all attributes that do not participate in a schema matching.
Let
be a boolean function that captures the application-specific constraints
on schema matchings, e.g. , cardinality constraints and inter-attribute correspondence constraints.
partitions into two sets, where the set of all valid schema matchings in is given by ={
:
→{
0 , 1
}
σ
| (σ) =
}
. is a general constraint model, where (σ) =
1 means that the matching σ can be
accepted by a designer. has been modeled in the literature using special types of matchers called
constraint enforcers [ Leeetal. , 2007 ], whose output is recorded in a binary similarity matrix. We say
is a null constraint function (basically accepting all possible matchings as valid with no use of a
constraint enforcer) if for all σ
1
, (σ) =
1.
Search WWH ::




Custom Search