Databases Reference
In-Depth Information
CHAPTER
5
Top- K Schema Matchings
The quest for certainty blocks the search for meaning.
-ErichFromm
Top- K schema matchings are intuitively defined as a ranked list of the best K schema match-
ings a matcher can generate. The formal definition, given in Section 5.1 , is recursive, providing
interesting insights into the behavior patterns of matchers. Top- K schema matchings play a pivotal
role in managing uncertain schema matching. The effectively unlimited heterogeneity and ambigu-
ity of data description suggests that in many cases an exact matching will not be identified as a best
matching by any schema matcher. Therefore, top- K schema matchings can be used to create a search
space in uncertain settings [ Anaby-Tavor , 2003 , Bernstein et al. , 2006 ] and can serve in assigning
probabilities in probabilistic schema matchings [ Roitman et al. , 2008 ]. They can also help improve
the precision of matching results [ Gal , 2006 ]. All in all, empirical results have shown the top- K list
to be a quality set of candidates for schema matching.
We start this chapter with a formal definition of top- K schema matchings (Section 5.1 ).
Section 5.2 provides three efficient algorithms for finding top- K schema matchings. We connect
top- K schema matching with ensemble decision making in Section 5.3 and detail a heuristic that
uses top- K schema matchings to improve on matcher performance in Section 5.4 . We conclude
with a method for using top- K schema matchings to assign probabilities to probabilistic attribute
correspondences (Section 5.5 ).
5.1
TOP- K SCHEMA MATCHINGS: DEFINITION
We now provide a formal definition of top- K schema matching using a bipartite graph representation.
For simplicity, we restrict the discussion to 1
1 matchings. Extensions can be made by using
hypergraphs, in which edges connect subsets of nodes rather than individual nodes.
Let G
:
(X,Y,E) be an undirected bipartite graph with nodes representing attributes of
two schemata and edges representing the degree of similarity between attributes. Assume a problem
instance with a positive weight function
=
( 0 , 1] defined on edges. Given a schema matcher
and a similarity matrix M , (i, j) = M i,j . It is worth noting that G contains no edges with 0
weight. Therefore, whenever M i,j
:
E
0, there is no edge between attributes i and j in G . A matching
σ is a subset of G 's edges, σ E .Therefore, σ E is equivalent to σ .The weight of a matching
σ is f(σ,M) , as defined in Section 4.1 . Given a constraint specification (see Section 3.1.3 ), we
consider hereafter only valid schema matchings in .
=
 
Search WWH ::




Custom Search