Top-K Schema Matchings - Uncertain Schema Matching

Databases Reference

In-Depth Information

CHAPTER

5

Top- K Schema Matchings

The quest for certainty blocks the search for meaning.

-ErichFromm

Top- K schema matchings are intuitively defined as a ranked list of the best K schema match-

ings a matcher can generate. The formal definition, given in Section 5.1 , is recursive, providing

interesting insights into the behavior patterns of matchers. Top- K schema matchings play a pivotal

role in managing uncertain schema matching. The effectively unlimited heterogeneity and ambigu-

ity of data description suggests that in many cases an exact matching will not be identified as a best

matching by any schema matcher. Therefore, top- K schema matchings can be used to create a search

space in uncertain settings [ Anaby-Tavor , 2003 , Bernstein et al. , 2006 ] and can serve in assigning

probabilities in probabilistic schema matchings [ Roitman et al. , 2008 ]. They can also help improve

the precision of matching results [ Gal , 2006 ]. All in all, empirical results have shown the top- K list

to be a quality set of candidates for schema matching.

We start this chapter with a formal definition of top- K schema matchings (Section 5.1 ).

Section 5.2 provides three efficient algorithms for finding top- K schema matchings. We connect

top- K schema matching with ensemble decision making in Section 5.3 and detail a heuristic that

uses top- K schema matchings to improve on matcher performance in Section 5.4 . We conclude

with a method for using top- K schema matchings to assign probabilities to probabilistic attribute

correspondences (Section 5.5 ).

5.1

TOP- K SCHEMA MATCHINGS: DEFINITION

We now provide a formal definition of top- K schema matching using a bipartite graph representation.

For simplicity, we restrict the discussion to 1

1 matchings. Extensions can be made by using

hypergraphs, in which edges connect subsets of nodes rather than individual nodes.

Let G

:

(X,Y,E) be an undirected bipartite graph with nodes representing attributes of

two schemata and edges representing the degree of similarity between attributes. Assume a problem

instance with a positive weight function

=

( 0 , 1] defined on edges. Given a schema matcher

and a similarity matrix M , (i, j) = M i,j . It is worth noting that G contains no edges with 0

weight. Therefore, whenever M i,j

:

E

→

0, there is no edge between attributes i and j in G . A matching

σ is a subset of G 's edges, σ ⊆ E .Therefore, σ ⊆ E is equivalent to σ ∈ .The weight of a matching

σ is f(σ,M) , as defined in Section 4.1 . Given a constraint specification (see Section 3.1.3 ), we

consider hereafter only valid schema matchings in .

=

Uncertain Schema Matching

Search WWH ::

Custom Search

Home